Data Governance – the knowledge, management and insights about and over your data estate is a very important topic. In the past years the number of data stores, systems and data interfaces grew around the world and in every company. Often, companies are overwhelmed and unable to cope with these challenges. It’s not only the question of where do we (as a company) store the data but also what are the information pieces stored and is there a need for data protection, encryption and classification?
In addition, the different kinds of data storages, like databases, storages, data lakes, SaaS applications … do not make it easier to gain a full overview about the data estate in the enterprises.
In the past, Microsoft released the Azure Data Catalog, to provide a tool to catalog your data assets. This solutions, well – it did not really worked well for some use cases.
Introducing Azure Purview
Today, at the Azure Data and Analytics event, a new Azure data governance service called Azure Purview (https://aka.ms/AzurePurview) was presented and made available in a public preview.
I have not had a chance to try the actual service, but I found a very interesting video (Microsoft mechanics video) where I took the following screenshots from.
As already described, data is available in many different places, formats and systems within a company. It can be stored in databases, files, SaaS applications either being on-premises, in the cloud or in hybrid environments.
The purpose of Azure Purview is to
- scan your data estate / data sources in your company
- classify information within your data sources
- provide a consistent end-to-end view over your data estate (lineage view)
- allows you to search and analysis your data estate using a data catalog
- allows your data stewards to get data insights out of your overall data governance information base
Search your data estate
One of the use cases for Azure Purview is the search catalog feature that allows you to scan and search for keywords within your metadata.
The screenshot below gives a first glimpse of the search results list. The list includes results for different data sources, glossary terms and you can filter based on different criteria.
Detailed information about one data source object is shown in the next screenshot – the overview page provides information about the type of data source, the last scan time and the hierarchy this element belongs to.
In this example, the hierarchy of the table SalesOrderHeader – it’s an Azure SQL table, stored in an Azure SQL Database.
Next, the schema view provides you with an overview about the underlying schema of the object – including classifications (either system- or user-defined classifications). Satya mentioned over 100 artificial intelligence powered classification methods to analyze your data.
and now – one of my favorites: – Data Lineage – Wow, just wow! 😉
This view provides an end-to-end view of your data estate around the currently selected data object (our table). The lineage includes data preparation steps and also Power BI datasets and reports!
In addition all related objects can be displayed and further analyzed.
Fill the Data Governance Service – bring information into Azure Purview
As mentioned in the video, Azure Purview comes with many different connectors. The data sources defined are scanned and the metadata is integrated into the Azure Purview Data Map (a searchable network graph).
In addition, Azure Purview has a tight integration to support Apache Atlas (link) – which allows an import of already existing Apache Atlas meta data stores.
Azure Purview already comes with a large number of connectors – data sources that are supported to integrate into Azure PurView Data map.
Data Sources can be grouped into collections and you can hierarchically apply classification rules and properties within these collections.
Azure Purview comes with system defined classification rules but you can also add your own, custom classification rules:
Now, set the file types these classifiers should be applied on and define scan trigger (schedules):
To take it to the next step, the collected metadata can be used from different user groups in your company – search the data catalog information, analyze the data lineage of data objects, find classified information and – very important – classification rule breaches.
To sum it up – I have not tried Azure Purview, but the first glimpse look very interesting…
My highlight of the announcement is the end-to-end lineage view – waiting for such an insight graph for a loooooong time.. and now it’s there, integrated with many different systems! Happy Wolfgang 😉
Know you data, know your data estate!
- Azure Purview for Unified Data Governance | Microsoft Azure
- More information: https://aka.ms/AzurePurview
- Create your Azure Purview instance in Azure: Azure Purview – Microsoft Azure
- Microsoft Mechanics video (where I took the screenshots from): https://www.youtube.com/watch?v=27bA4KFiEKk&feature=youtu.be
- Microsoft launches Azure Purview, its new data governance service | TechCrunch
- You have Azure Purview questions? Join the Purview tech community – Azure Purview – Microsoft Tech Community
- Azure Purview documentation – Azure Purview documentation – Azure Purview | Microsoft Docs
- Quickstart: Create an Azure Purview account in the Azure portal (preview) – Azure Purview | Microsoft Docs
- Power BI and Azure Purview – Use Power BI with Azure Purview to achieve better data governance and discovery
- Register an Power BI tenant in Azure Purview – Register and scan a Power BI tenant (preview) – Azure Purview | Microsoft Docs