Data Product Pyramid

Implementing a Business Strategy using Data Products



There has been a lot talk about Data Products, especially with the rise of approaches like Data Mesh. This is important, as I do believe if done correctly, creating Analytics using Data Products can drive real business value and significantly speed up analytics deliveries.

In this piece I will try and tie down what a Data Product should be, from a business view and look at a framework for building them. I won't dive into the details of architecture, technology or data governance aspects, as these will be covered by further articles.

So first we need to do a bit of deep dive on Data Products and address some challenges with the way the industry is viewing a Data Product.

A popular (but by no means the only) definition comes from the Data Mesh concept.

Data Product as Defined by the Data Mesh

This details some fairly common aspects that define it as a "product", in it's view. These typically, tends to focus on the technical, packaging and deployment aspects of what is being delivered.

Although these are a really good set of requirements they miss a some really important aspects:

  1. They don't provide any business context or guidance - these are really tech not business requirements i.e. they scope how the Data Product is to behave in it's technology environment.
  2. They don't include any actual product requirements. I.e. What problem the product needs to solve, who is the customer, what is the value etc.
  3. A product manager would recognise very few of these aspects and it's hard to map a product life-cycle process using just these.
  4. There is usually no way to tie to any kind of data or business strategy. Ie what products do I build? How to do they change as the business does?

Indeed if one were to try and apply this directly to a analytics orientated business strategy or even a use-case then one would be left feeling quite a bit is missing.

The Data Mesh, as well as a lot other approaches, typically talks about "Data-As-A-Product" or "Data Products" that focus on the data, but again, I see a few of problems with this view.

  1. Most data is an intermediate artefact. E.g it is the input to an analytics model (be it Excel, ML, a Data Warehouse schema etc. It rarely provides the business with the actual insight or value directly.
  2. Data gets copied, split up, aggregated and spread around as constituent parts of many other components and use-cases.
  3. The product bit usally only refers to packaging and deployment aspects, including the run-time and operational features. Now I can understand this as traditionally the data that gets thrown to analytics team is very unloved by producers (or loved just enough for their use-case and for no other), so asking them to start treating the published data the same way they supply software makes a lot of sense.

What these actually describe (and indeed what most people mean when they talk about data products) are in fact Data Services. We'll have a closer look at these later on.

Data Product - How to focus on the Product Part

To bring this back to a more product view, we should lay the ground rules for what a Data Product is and how it relates to the business.

To have a go at solving this I'm going to place a stake in the sand on what a Data Product is, from my view point.

I know this may be contentious and happy for people to disagree but when one is talking to business stakeholders one needs to talk in business terms, so I am casting the product in this way.

Specifically this translates to:

  1. Be targeted and specifically solve a business problem in an analytics context (i.e. has a direct business requirement).
  2. Provide direct value to the business customer.
  3. Not just be dataset unless it’s actually sold (e.g. on an external marketplace).
  4. Have a product life-cycle with direct business customer interaction-feedback etc.

In order to address the business strategy from a product perspective we need to start with the key problems the business strategy is trying to solve.

For example for an Operations Manager - How do I reduce my operating costs by 10% per year.

The Analytics builder (in charge of delivering the outcome), together with representatives of other parts of the business (who are able to answer the lower level questions), need to break this down into smaller analytics and data problems rather than one big bang delivery.

If we visualise this as a decision flow:

Data Product Decision Process

As we work down the flow we are asking key questions, whose answers will address the requirement and defining outcomes that are in-fact Data Products that provide answers to the questions.

In each part of the flow we are trying to de-compose the problem into lists of questions in that are answered with, finer grain Data Products and services (as shown on the right of the diagram), in the layer below. I.e as we move down we get more specific and also we get closer to actual data from the business processes and systems that will supply the data as services.

Data Product Pyramid

This will manifest as a pyramid of Data Products that are each designed to solve a peice of the process that ultimately addresses the top level business problem.

The reason that Data Products are organised in a pyramid is so we can deliver, quickly, incrementally and in a fail fast way. Also feedback can be received from the business without waiting for the whole use-case to be delivered.

The process starts with the top level decision that needs to be made based on a specific business desire. e.g. I need to reduce my operational costs, I need to test my marketing strategy etc. This then breaks down in to questions to be asked.

The reason for starting with a decision is to have a measurable outcome and have a specific target to aim for. Many times D&A teams get asked for information and spend lots of time struggling to create models, source the data only to find that it doesn't answer any specific question or it was the wrong question ot it couldn't be created as the org didn't have the support (data etc..) to enable it's creation.

So, to put this simply:

The right (valuable) questions asked in the right way (feasible) whose answers actually supports (or drives) the right decision.

To build the pyramid, we first create a mock pyramid (with Data Product place holders) based on our businesses of how the use-case breaks down, via the decsioning process and using the businesses and respective data teams knowledge.

We then create some actual Data Products with mock data, to get an idea about the data requirements. The data is used to serve the lower and the upper layer Data Products. We then re-factor and refine each Data Product replacing mock data with real as we source it and work through the detailed requirements.

If products already exist then they can be used as part of the process.

Types of Data Products

It's also important to node that the pyramid levels are made up of different types of products that solve a different piece of the puzzle and have a different approach and purpose

These break down into:

  • Decision products - these are models/algorithms and decisioning products that either tell the business what they need to do or actually execute an action on the back of a decision that it makes. e.g a recommendation engine could actually could buy a product from an e-commerce system (on user clicking yes to a buying choice shown by the engine).
  • Knowledge products - these actually tell me something about my business that has been derived from the underlying data and information products - i.e. they are usually models or algorithms that forecast or apply semantics that tell me why something has happened (e.g. Single Customer View, classifications etc.).
  • Information products - these (as their name suggest) just give me information. They sit above the actual transactional data but still give me answer to a fine grain question e.g. what's my total historical cost etc.
  • Data Services - as I said before, these aren't Data Products but are important components that serve the specific data, in the specific format and operational profile (e.g SLA, access pattern etc..) for all required Data Products. These are specific for the use-case and aren't designed for reuse specifically.

    They "assemble" data from the transactional world and other products to build a specific structure for use by a specific Data Product. They also include annotation and mark-up so that other consumers can re-use them as long as they don't break the original use-case.

Data Product Strategy

So we have talked about Data Products but they can't exist on their own. They need to be part of an overall process for implementing a business strategy.

The diagram, above, shows a how we create Data Products as part of an overall strategy .

On the left are the key business change drivers which provides the trigger any change to go through the process.

Each change gets valued, prioritised and framed from a business and solution perspective.

It then gets distilled into a group of Data Products through the Data Product Funnel. There is continuous reviewing and refinement by the business customer (including stakeholders) with a fast feedback circuit for build and deployment.

When an acceptable outcome for the product has been achieved, it's published on to the company's' Data Foundation infrastructure for use and re-use across the business.

Business Change Drivers

The starting point for the process is reacting to business change drivers. These are summarised into three areas:

  • Business Driven - where the business strategy changes because of an aspiration by the executive or forced change by market forces.
  • Environment Driven - where the environment changes and the business needs to respond - Covid 19 was a recent example.
  • Operationally Driven - where the BAU changes such as operational breaks, new systems, new data etc are experienced and the processes and systems need to change in a small way to accommodate them.

Data Product Funnel

The Data Product Funnel is a process we use to implement Data Products to address the business changes.

We create business initiatives as analytics decisions with questions and drop them into the funnel at the top, work out way down (iterating as we go) and Data Products come out bottom, deployed to the infrastructure.

Note: Event though we travel down the funnel, we can return to any part of the funnel from any other part.

To do this we follow this process iterating as we go:

  1. Business Change Driver to Use-cases
    First we fill the funnel with the key change initiatives as a list of actual questions based on decisions that need to be made.
  2. Use-case selection / prioritization
    Next, the business teams use the Value Framework to prioritise the decisions process based on their value.
  3. Business framing
    Next, we make sure the questions are the right ones and being asked in the right way.
  4. Solution framing
    Next, we select the type of analytics problem are we solving, what is the profile, initial inputs etc.
  5. Build & iterate the use-case as a Data Product pyramid
    Then we de-compose each use-case / decision into a pyramid of Data Products that are built incrementally as we iterate up and down the funnel.
  6. Transition & Deploy to Data Product Foundation Infrastructure
    Finally we publish the products to the Data Product Foundation Infrastructure which contains a Data Product catalogue that maintains meta-data about aspects like runtime-operations and allows for easy re-use.

Key Outcomes

If we follow this process we get following outcomes:

  • An Adaptive Business Strategy - The business strategy can start small and evolve and adapt as the business changes.
  • A rapid delivery highly valuable use-caes based directly on the business strategy.
  • A portfolio of actual and specific real world"products" that can used and re-used across the business.

Implementation Process

So we know what we need to deliver (the pyramid) and we know how it fits into a strategy but how do we implement this?

We break implementation capability into three parts. All an executed as an iterative cycle:

  • Value Framework - This guides the business to select the most valuable business decisions to be make with the right questions to ask.
  • Fast Adaptive Implementation process- Is a process that allows the data and data and analytics teams to quickly experiment,develop and refine the decisions/use-cases into Data Products.
  • Flexible and Robust Data Product Foundation - Is a capability based on a Data and IT infrastructure to provide tools and software to develop, deploy and operate Data Products in support both the other two pillars. This would typically include technical concepts from approaches such as as Data Mesh and Data Fabric.

Let's deep dive into each one:

Value Framework

The Value Framework enables the business strategy to be incrementally defined based on actual value and also links the business vision and aspirations to the deliverables. It breaks down into:

Value Framework - This is a way for the business to “all” agree on how value is measured and applied to Analytics Business outcomes.
Business Value Map - As initiatives are defined incrementally (either ahead of time or as response to an outside events) this process is used to map the actual agreed value each initiative so for prioritisation and focus.
Business ad Solution framing - Each initiative is de-composed into business use-case candidates that get triaged so see if a) it's the right question and b) it is being asked in the right way. E.g the Operations head might want to reduce costs by reducing headcount but actually much lower hanging fruit could be reducing infrastructure costs.
Use-Case Selection and Prioritization - As each use-case is created and framed, feasibilty is done along with other prioritization tasks to allow the winners to fill an implementation funnel to be delivered as Data Products in the next part.

Fast Adaptive Implementation Process

This is a iterative way of quickly creating and refining Data Products for each decision/use-case and breaks down into:

Use Case Implementation Process - An iterative development methodology to quickly de-compose the use-case into hierarchies of data products and data services that are understood by the business.
Data Product Design Framework - Library of Data Product archetypes, templates and patterns (composition, definitions, code artefacts etc.) of that solve different pieces of the use-case that can be created quickly with no to little, boilerplate code.
Data Product Portfolio / Marketplace - A business friendly searchable marketplace where Data Products get deployed and run. Each Data Product has a full product life-cycle – i.e. get created/used and retired.
Low Friction Federated Data Governance & Management - There is lots of detail underneath here (which will be addressed in later blogs) but simply, it's the capability for each Data Product to implement it's policies and rules that are defined globally (for the org) and locally for the decision/use-case.

Data Product Foundation

This is the data infrastructure and development capability for building and deploying Data Product software and data artefacts. It breaks down into:

Cross Business Data Product Infrastructure e.g. Data Mesh and other Product architectures. This allows Data Products to be deployed in a controlled and low-friction way by all teams in each business. It supports multiple technologies, data & analytics patterns - note there is no single stack.

In a previous article I do a deep dive the technical make up ofa Data Product (in the example as part of a Data Mesh approach)

Rapid Data Product Innovation Capability - This enables fast prototyping and experimentation of new data, technology and products in secure environmental sandboxes.
Incremental creation of Business & Data Function Artefacts (Ontology +) - Shared data artifacts such as business dictionary, polices, domain maps, ontologies etc… get built and refined incrementally and iteratively as each Data Product pyramid is built. Again lots of detail in this and, again, will be address in later articles.

It's important to note the highly experimental and iterative nature of the early parts of the process. Data Products can be created in very rough form to garner feedback from a business initiative and get more mature as they are tested and refined.

Finally, once they have shown to be fit for purpose they can be tagged as "production" ready published as such on the infrastructure.

Data Products can have long life-cycles (e.g. quarterly reports) or short lived ones (e.g. ad-hoc research for an imminent business event). But they can all be created and retired easily as the business changes.

Also, it's important that any type of Data Product is supported i.e. Data Products that need high operational profiles (e.g. real-time recommendation engines) can be published on the Data Foundation alongside any ones who have low profiles (ad hoc market research reports).

Products can be real-time/batch or have any other type of operational profile as the business doesn't distinguish between them. Again, I won't deep dive into this very important aspect here but this article details an architecture that supports this.

Domain Ownership

For this approach to work in large complex organisations, the process and supporting foundation needs to support Data Products being owned,built and run by teams in different teams in different areas of the business. For the purposes of this article I shall refer to these areas as Domains.

In this example the data team member charged with the top level use-case only builds the top level decision product. They liaise with four other domain teams to build each of the different products and services to form the pyramid.

It's important that the work for each product is carried out by the team that owns that part of the requirement i.e. the Data Product should be delived directly by the team that owns the business process that delivers that insight or data.

A Worked example

So, to solidify this we shall work through an Example:

Imagine we have a head of Operations who has a businesses goal to reduce the operational costs across all their business areas.

So first, we state the decision is to lower operational costs! This would be agreed as a priority using the Value Framework.

Next, we need to ask what is needed to make this decision - this points us to questions like:

  • What is my target cost reduction? The overall measure of success of a pyramid.
  • Next we will need start drilling down e.g.
  • Where will my highest costs be? Where do I need to target?
  • which breaks down to
  • Future costs correlated to business need
  • which breaks down to
  • Historical and future costs across by business line
  • and
  • Historical and future transaction volume (a simplified used model just for this example) by business line
  • which derives from
  • Lowest level costs from HR, ops and other systems
  • and
  • Staff rosters from scheduling and time-sheet systems
  • and
  • Transaction data from OLTP systems.

So one can start to see how each of these questions is answered by a pyramid of Data Products and Data Services (in the lowest level case) all build to answer the questions in the layer above them.

The diagram (above) shows an example Data Product Pyramid that services this pyramid

  • The top level Decision Product is a staff optimisation engine that creates a new roster based on optimal events and costs calculated at the lower levels of the pyramid.
  • The next level has two Knowledge Products - The first forecasts future costs, the Second correlates the future business costs to future bsuineess events.
  • The next level has has one Information product - a metric (that gives the historical costs by business line), and two Data Services. The first gives the historical transcations and the scecond give staff roster details.
  • The last layer has a lot of Data Services that all present the data from all the transactional systems across the business lines to Data Products and Services at the upper levels.


So, hopefully I have shown how a way we can use Data Products to directly address and implement business strategies

Using these approaches and frameworks allows a company to start small and incrementally build up not only the business strategy but a valuable collection of business orientated Data and Anlytics "Products" that can be quickly used and re-used across the whole organisation

We have shown

  1. How to incrementally define and refine a Analytics orientated business strategy.
  2. How to break down business use-cases into valuable small incremental deliveries.
  3. How to create a portfolio of usable business products.

As always, comments and feedback are really welcome and even if you want to know more, drop me a line by using the social media links, below

Jon Cooke - CTO Dataception
Jon is a data expert with more than 20 years experience in creating Data Platforms and is creator of Dataceptions Enterprise Data Mesh platform