The last time we talked about a Data Mesh we focused on the idea of Data Domains and the essential part they play in creating a scale-up data structure across an organization. A common question that arises however is how to deal with overlapping contexts of data and whether the Data Mesh creates too much duplication throughout the environment. To do this, I’m going to leverage some content and graphics from the Cloud Adoption Framework.
Where would duplication come from?
An example of duplication would be in the cross-over between the Sales / Demand domain and the Production domain. There is a high potential that both need to understand the incoming demand and this might lead to data duplication. The question is, “do we duplicate the data between the domains? If so, how do we align the certification of the data and which is the parent? If data is not duplicated, how do we handle the cross-over. To answer this and to expound on the integration models, I’ll leverage some work that was done by the Cloud Adoption Framework on Data Domains and overlapping contexts:
Pattern #1: Separate Ways
The Separate Ways pattern assumes that data does not cross-over and data is entirely owned by the domain. This is common in organizations with highly focused sub-organizations that have little overlap and little need for sharing. The domain contains the entirety of its certified dataset and does not share the data with other domains, at least not in a structured way.
Pattern #2: Partnership Model
The Partnership Model is where individual domains collaborate to create a loosely coupled shared-domain data partnership where both work against one data product. In order for this to work the domains need to work together in a coordinated fashion and collectively prioritize each’s needs. This might be used when several domains are doing common work, but don’t want to duplicate the data because of cost or upfront activities tied to the initial build.
Pattern #3: Customer-Supplier Model
The Customer-Supplier model is one of the most common in a Data Mesh. This model strictly defines the source dataset and owner, then causes downstream consumers to leverage data from that source. In this model the owner is a dependency, but also a point of prioritization and standardization of the data. The downstream consumers leverage the data, potentially duplicating parts of it but the lineage is clear from consumer to source.
Pattern #4: Conformist
The Conformist pattern is where the overlapping domains conform to all requirements. This causes extreme levels of coordination and prioritization and is a choice for very complicated integration scenarios. This pattern removes much of the flexibility of the Data Mesh architecture and slows down the innovation, but might be chosen if conformity benefits are higher than innovation or flexibility.
The overall architecture often looks like the following, where each domain delivers something of its own, then sends data to other domains as necessary, creating some duplication, but clear lineage and ability to innovate (source CAF):
In closing, understand the Data Mesh is a concept for accelerating data usage and all of the scenarios above might be legitimate based on the right circumstances. Be intentional about the choice best aligned to your organizational goals.