In the past months I had the chance to play with and build solutions based on Azure Synapse Analytics and Azure Purview.
Azure Synapse (my Synapse blog entries) as the foundation for a solid platform to store, analyze and build data solutions and Azure Purview (my Purview blog posts) as the data governance and data catalog solution in Azure.
During the writing of my latest blog post (What’s new in Azure Synapse Analytics?), I found a very interesting entry in the update feature list: Azure Purview Integration.
And this really got my attention – use Azure Purview to find your data artefacts and use it for exploration… sounds nice.. let’s try it!
Configuration – Connect Azure Synapse to Azure Purview
In the Manage Hub of Azure Synapse, the new menu entry Azure Purview (Preview) is the entry point to connect your two Azure services.
You need Contributor rights in Azure Synapse and access to your Purview instance. In the connection dialog, use the browse option or enter the Purview ID into the manual section.
When connected, the config section allows you to edit the mapping (i.e. change it to another purview instance), refresh (I am not sure what this action really does, but yes, refresh the connection) or disconnect the mapping. With the link on the purview instances name you’ll directed to the Purview studio.
Use the Purview integration in Azure Synapse
According to the documentation, you can use the Purview integration to discover data registered and scanned by Azure Purview …
This integration search is implemented the different hubs / development areas in Synapse Studio. It is available in the Data, Develop and Integrate hub in Synapse studio. After the Synapse / Purview connection is configured, a click into the search box allows you to select the search scope – let it be Purview or the (Synapse) Workspace.
Let’s search for the term Customer in our Purview account. The results are listed in the Purview search tab which is similar to the search results tab directly built-in Purview Studio.
For the demo of the functionality, I first open the Customer table (in an Azure SQL database). The detail page for a table in an Azure SQL database looks very familiar to the one in Purview directly. Purview features to browse your data assets are there as well. If your Purview credentials allow, you are also able to edit the properties of the entity (see my comments about security later in this blog post).
Start with an Azure SQL table
With two differences….There is a Connect and a Develop menu. Remember, we are located on the Azure SQL table detail page. With the Purview integration
As you can see in the screenshot, using the Connect menu allows you to create a linked service definition or a New integration dataset. For the definition of a new integration dataset, you’ll need to create the linked service first. At the moment (2021-01-27) and according to the documentation, the Synapse/Purview integration is not able to infer if there is an existing linked service or integration dataset.
In the Develop section, with an Azure SQL table selected, we’ve got the option to create a New data flow.
Or with a file in a Data Lake
Switching back to the results list and next – open the wwi-dimcustomer.csv file in the ADLS Gen2 account. The overall detail page looks very similar to the one of the Azure SQL table, the Develop menu contains some more actions.
For the selected entity (file in the data lake), the Develop menu allows you to directly start with your data exploration either using SQL OPENROWSET options, integrated Spark or a new data flow.
How about the credentials / security?
Based on the documentation page, the Purview integration in Synapse requires you to have Purview access. Synapse passes-through your Purview permissions.
To read more about Azure Purview roles (and their permissions in the catalog), I recommend you to have a look at the documentation.
To sum it up.. Better together? yes? no? YES!
In my opinion, the Purview integration in Azure Synapse is great step into the right direction.
- With Synapse positioned as the central analytics platform, the direct access to Data Catalog information without leaving Synapse Studio is great.
- Next, the immediate creation of linked services, integration data sets, or data exploration at your context menu is really a this makes it really easy to integrate data into Azure Synapse.
Stay curious, catalog your data artefacts AND enjoy your day! 😉