We’ve heard the story… five people wearing blindfolds walk up to an Elephant. Each interprets what they feel and experiences a part of the entirety of the elephant. “It’s a tree trunk!”… “it’s a rope”… “it’s a hose”… only when combined together do they arrive at the conclusion of what they are looking at. Most data estates have the same story. We attempt to map it based on personas and Visio diagrams, usually ending up with artifacts with limited value. There needs to be a better way. We really can’t continue to maintain these documents that become obsolete the second they are created. In comes Microsoft Purview, a tool intended to inventory, map, and apply governance to the data estate. It’s new, it’s unfinished, but it represents a way that the cloud can begin to replace the legacy artifacts with something self-maintaining and accessible to the broader needs of the data mesh.
What is Purview?
Microsoft Purview is a platform that facilitates inventory, mapping, governance, and understanding throughout a broad data estate. The most basis capability is to inventory and present a data catalog for finding data that a consumer might be interested in. It is intended to be far more than that however… as it will provide a tool for governance, policy, and security across the same estate.
What is Purview Lineage?
For me, one of the most interesting capabilities is that of visibility. The way this visibility is provided is via a capability called Purview Lineage, which displays the complex data estate in an interactive format. For a video overview of Purview Lineage, see below:
Lineage Maps
Pretty cool, eh? Not all sources exist yet, but part of the picture is better than none. Sources like Azure Data Factory, Azure data platforms, Snowflake all exist and can be mapped. This creates great diagrams that can be used to understand the estate, the owners, and how they relate to the glossary.
In the example above, you can see the mapping of LastName across several parts of the data estate, allowing an easier time understanding how one data element is used in various capabilities, all the way to Power BI or a data science consumption layer.
That sounds interesting, but how does it create the map?
The first thing is that scan sources are used to inventory the estate from various sources. These sources are retrieved on a scheduled basis of your choosing.
Then, these become understood assets:
The assets then become queriable and mapped to the glossary in the Purview estate. The glossary provides an index across all mapped assets.
Where identified experts can be mapped:
The ability to map an individual from Office 365 as an expert aids discoverability of data for those not familiar with an organizational hierarchy, as well as provides insight into who can make decisions regarding the data’s use and processing.
Then ultimately resulting in a “picture of the Elephant” as so:
So, all of this ends up in a map, which is starting to feel like the picture of the elephant to me. It doesn’t have everything, but I’m excited about the prospect that it *could* have most-things and will drastically cut down the manual work in maintaining and documenting an estate.
What makes me excited about Purview is it *might* get us closed to avoiding creating visio diagrams and instead using a live system to explore and engage a data estate. This live environment can be combined with security and governance controls so the right people can explore the right things. For its price, now is a great time to explore Purview and see how it can play a part in your larger data story, whether you are using Azure, Snowflake, or a combination of both.
Nathan Lasnoski