DataGrillen 2022 – Data Governance Session

Last week was DataGrillen 2022 time. An in-person event and it was fun. Big kudos again to the organizers – Ben and William for another edition of a relaxed and perfectly organized event!

I had the pleasure to talk about one of my favorite topics – Data Governance with Microsoft Purview. As the event was a little bit focused on BBQ (Grillen in german) I thought – Why not change the topic a little bit and focus on the real important things of life: How to find the best food (beef) in your organization.

Title slide of my presentation

All in all, the session went well. I enjoyed giving the presentation and according to the session feedback, the attendees did like it too! Thanks again for attending my session (and providing feedback)

The slides are available here – please download them and if you have any questions about Data Governance, Microsoft Purview or you want to start you Data Governance journey and need help: do not hesitate to contact me!

Finally, after more than 2 years, I had a chance to met my mentee Nikola Ilic for the first time “in real life”. We’ve spent hours and hours in virtual meetings, but finally in Lingen we met.

Some pictures from the event:

Happy data cataloguing,

Wolfgang

Advertisement
Posted in Azure, Azure Purview, Conference, Data Community, Data Governance | Leave a comment

(Dynamic) Data Lineage in Microsoft Purview

One of the most important things to better know a data estate is to investigate into the data lineage topic.

In a non scientific definition, data lineage defines …

  • Where does my data come from?
  • Where does my data go to?
  • and What happens on the way?

In this blog post, you’ll see how data lineage from Azure Data Factory and Synapse pipelines are pushed into the Microsoft Purview. There is also a new (at least I found out about it some days ago) functionality that brings metadata-driven, dynamic pipeline runs into the lineage information in your Purview data map.

TL;TR – There is a video for all of that

Data Lineage in Microsoft Purview

In Microsoft Purview, the lineage view allows you to get to know more about the lineage of a certain data asset. Within the chain of your data supply chain, there are different shapes that Purview puts together.

Example of Purview data lineage

First, there are data stores like Azure Data Lake Storage accounts or Azure SQL Databases that store information and within the Purview context, contain data assets. The Microsoft Purview data map pulls this information from the sources during scan processes.

Second, there are transformation steps that connect data assets. In the example above there are two transformations shown (both are Azure Data Factory (ADF) pipelines – to be more specific ADF Copy activities). In the context of Purview, lineage information is pushed from ADF into the Purview data map.

How to get ADF/Synapse lineage into Microsoft Purview?

If you have not connected your Azure Data Factory (ADF) / Synapse workspace with your Purview account, you might start with the registration of a new source in Purview. BUT – there is no ADF source available in the Purview data governance portal.

No ADF source in Purview

In order to connect ADF with Purview, you need to start in ADF. Within the Manage menu, there is a Microsoft Purview section. In there, just connect your ADF instance with an existing Azure Purview account.

Already connected ADF with a Microsoft Purview account

This connection comes with two (main) options – search your Purview data catalog from ADF and the lineage push from ADF into Purview.

Search your Purview data catalog directly from ADF

Are there any additional requirements that lineage is pushed from ADF into Purview?

No. Simply speaking no. You only need an ADF Pipeline containing a copy activity. And – you need to run that pipeline at least once.

Every execution of a pipeline (in an ADF instance connected to Purview) pushes lineage information into the Purview data map. You can check that by a drill-down into the activity run history. Every copy activity execution gets a new icon (lineage status) that indicates the lineage push into Purview. In some cases, the lineage push does not work because of some limitations / requirements for the copy activity (see more in the documentation – https://docs.microsoft.com/en-us/azure/data-factory/tutorial-push-lineage-to-purview)

But what about Dynamic / Meta-data driven pipelines in ADF and Synapse?

In many of our projects, we do not develop ADF pipelines that are “hard-coded” – i.e. that are configured to copy data from a fixed-defined source into a fixed-defined destination. What we do instead is to use metadata driven pipelines.

Within these metadata driven pipelines, usually a lookup activity is used to get the list of objects to load and a ForEach activity loops over the items to load. I will not go into details of metadata driven pipelines (there are tons of blog posts out in the thing called internet).

a simple metadata driven ADF pipeline
ADF copy activity using a data set with parameters. The values are set with the current iteration-item (@item())

There was one problem with this kind of pipelines and their lineage push into Purview: it simply did not work. Dynamic Copy activities were not supported and the lineage of these pipelines did not appear in the Purview data map.

But fortunately this changed a few days ago. I have not seen any public announcement of that feature but for me (and my colleagues) this is huge gamechanger in the integration of ADF/synapse and Microsoft Purview.

Let’s run the example pipeline from the screen shot above and view the monitoring information of that run.

Monitoring log of an execution of the dynamic pipeline

What we can see is, that there are three executions of the Copy activity and all of them push the lineage information (indicated by the icon in the column Lineage status) into the Purview data map.

Lineage view in Microsoft Purview

Let’s head over to Microsoft Purview, search for the pipeline and … hmm… there is no lineage information available.. That’s because the pipeline itself does not expose lineage information – it’s the copy activity that does the work!

No lineage information for a ADF Pipeline in Microsoft Purview

If you open the data asset of the copy activity, we’ll be successful! Lineage is there. And with that – it’s the dynamic (metadata) lineage shown here.

Lineage in Microsoft Purview – dynamic, metadata lineage is here!

That’s it for the first look at the dynamic data lineage support in Microsoft Purview. I have not tested it in more details, but as a first conclusion I am really happy that dynamic copy activities are finally supported!

Posted in Azure, Azure Purview, Azure Synapse Analytics, Microsoft Purview | Leave a comment

Azure Purview is now part of Microsoft Purview

I was ready for a nice relaxing evening today, when an email appeared in my inbox “Azure Purview is now Microsoft Purview!”

Initially I thought… yeah.. “just another Microsoft product name renaming” .. but when I read through it more in depth I found out, that this is NOT just a renaming.

The email I found in my inbox

Microsoft Purview = Azure Purview + Microsoft 365 Compliance portfolio

Microsoft Purview is the (new) name for a comprehensive set of products to govern, protect and manage your entire data estate. From tracking your data sources and their dependencies to managing data compliance regulations, data loss and data risks.

According to https://www.microsoft.com/en-us/security/business/microsoft-purview, Microsoft Purview helps you to

  • Understand and govern data
  • Safeguard data, wherever it lives and
  • Improves risk and compliance posture
source: https://www.microsoft.com/en-us/security/business/microsoft-purview

You can also watch the intro-video of the Microsoft Purview team:

Can you tell me your name please?

Microsoft Purview is no new service or product, it combines already existing services under one single umbrella. For a better reference I reference the new-name-cheatsheet from https://www.microsoft.com/security/blog/2022/04/19/the-future-of-compliance-and-data-governance-is-here-introducing-microsoft-purview/

The new names of the pieces that put together Microsoft Purview.
Source: https://www.microsoft.com/security/blog/2022/04/19/the-future-of-compliance-and-data-governance-is-here-introducing-microsoft-purview/

Not TL;TR – if you need more information

What’s next?

Well, this announcement needs some time on my side to get a better overview and deeper insight about the compliance part of what is now called Microsoft Purview. Stay tuned for some more blog posts and maybe some videos about it.

Love your data, #DoDataBetter and join the #TeamDataGovernance,

Wolfgang

Posted in Azure Purview, Cloud, Data Governance, Microsoft Purview | 2 Comments

Azure Purview Announcements March 2022

Hi members of #TeamDatagovernance!

It’s SQLBits week and I have the feeling that the Azure Purview team released some new features I want to summarize and give you some pointers to the announcement posts and documentation:

Dynamic SQL Lineage

For me, Data lineage is one of those fascinating techniques to better understand your data estate and get a better knowledge how systems are connected and what data flows are there in your data landscape.

Lineage was there in Azure Purview since the beginning (Azure Data Factory, SSIS lineage, Power BI) but this week another very important part of data lineage was put into public preview: Dynamic Lineage Extraction from Azure SQL Databases.

There are different ways how data lineage can be extracted from systems – one of them is static code analysis. The static approach includes all the CREATE PROCEDURE / CREATE VIEW statements and summarizes them into a lineage graph. This approach is powerful but there is more going on than the initially defined DDL statements like the execution of dynamic SQL statements.

The approach that Azure Purview implements is the dynamic lineage extraction (announcement post) that incorporates all the actions that happen in the database.

static versus runtime/dynmic analysis (source: https://azure.microsoft.com/en-us/blog/introducing-dynamic-lineage-extraction-from-azure-sql-databases-in-azure-purview/)

I’ve recorded a short video (youtube link) for a first, quick look on that feature.

Turn on lineage extraction in scan options for Azure SQL databases
Data lineage including a Stored proc

If you want to read more about the feature, learn what pre-requisites have to be met -> https://docs.microsoft.com/en-us/azure/purview/register-scan-azure-sql-database?tabs=sql-authentication#lineagepreview

Azure Purview now understand SAP Business Warehouse (BW)

There are new built-in sources added every now and then.. and today, SAP Business Warehouse (BW) was added to the list of supported sources:

https://techcommunity.microsoft.com/t5/azure-purview-blog/azure-purview-adds-support-for-sap-business-warehouse/ba-p/3253404

SAP BW and Purview documentation: https://docs.microsoft.com/en-us/azure/purview/register-scan-sap-bw

Workflow support for Data Access Request (Preview)

To know your data sources, their connections and so on is a beginning for your data journey, but access securement is a huge and very important topic.

The Purview team released a new block of functionality – Workflow support for managing data access in Azure Purview

I did not have any time to dive into that feature, but you will get an update (blog post, video) in the next weeks.

In the meantime, please have a look at the announcement blog posts from the team:

Happy data cataloguing and data scanning,

Wolfgang

Posted in Azure, Azure Purview, Data Governance, Microsoft Purview | 1 Comment

SQLBits 2022 – It’s quite a #DataGovernance theme for me…

17 Days to go… Starting on Tuesday, 8th March London will be the Microsoft Data capital for a week. It’s SQLBits (https://arcade.sqlbits.com/) time again and this year – if everything goes well – it will be an in-person conference experience for me!

The conference itself is a hybrid event – so we can meet in-person or virtually!

I am really looking forward to a week full of data platform sessions, discussions and news, new features and content!

This years SQLBits will be different for me – It will be a full packed week – all in all I am there for

  • 1 Traing Day about Data Governance – From Theory to Hands-On with Azure Purview on Tuesday
  • 1 surprises on Friday
  • 1 20 minutes Session (Friday): “Why you should care about Data Community and how the community cards about you
  • 1 50 minutes Session (Friday): “Data Governance with Azure Purview – Ask the Experts” Together with Victoria (t), Erwin (t) and Richard (t) we will answer and discuss your Data Governance / Azure Purview questions. Submit your questions here: https://forms.office.com/r/dTP38LnmsJ
  • and last but not lease – 1 50 minutes session on Saturday together with my colleague and friend Bernhard Engleder (t) we’ll talk about Data Governance with Azure Purview – Theory, Customer Insights and Demos. We will share insights from our customer initiatives and how we at cubido guide our customers into the Data Governance journey.

I will be there for the full conference – feel free to stop me and say Hi! I am one of the people with a mask on.. ๐Ÿ™‚

More information (schedule) – https://arcade.sqlbits.com/

Registration – https://register.sqlbits.com/event/36db4730-583c-49c1-adcf-f76432bb6580/summary

Stay curious, and more important – stay healthy!

Wolfgang

Posted in Azure, Azure Purview, Conference, Data Community, Data Governance, Microsoft Purview | Leave a comment

A new Azure Purview permission level – Policy authors

I’ve spotted a new Azure Purview permission level – Policy Authors. This new permission level is connected to the new Data Access Policy Management in Purview (as of today 2021-12-30 in preview; https://docs.microsoft.com/en-us/azure/purview/how-to-access-policies-storage)

One may ask: Why Wolfgang, why no blog post about that feature? Well, the answer is easy: it’s in preview and the preview limitations are strict – the enforcement side of the policies are limited certain Azure regions. And therefore, I am planning to try it (and blog about it) when it will be available in more regions.

Posted in Azure, Azure Purview, Data Governance, Microsoft Purview | 1 Comment

Add a root collection admin to your Azure Purview account (not in Purview Studio)

Some time ago, the permissions scheme in Azure Purview changed and the collection level was introduced as a level of permission. (documentation – Access control in Azure Purview)

With that, the root collection and its assigned administrators get more and more important. The permissions in Azure Purview are configured in Purview Studio (Collections -> select a collection -> Role assignments).

Azure Purview role assignments in Purview studio

What I found out today, is that you can add a Purview Root Collection Admin directly in the Azure Portal. Open you Purview account, and select the Root collection permission menu entry.

Purview Root Collection Admin

Select one member account and add it to the list of collection administrator.

Successfully added an additional root collection admin

Happy permission management,

Wolfgang

Posted in Azure, Azure Purview, Data Governance, Microsoft Purview | Leave a comment

Azure Synapse Analytics – Success by Design Documents

Today, I stumbled upon a very interesting link – the Azure Synapse Analytics – Success by Design site (follow this link).

If you need guidance, best practices links, POC playbooks, links to blogs & videos, tools, .. THIS is the site you need to bookmark.

One section I really like and definitely will have a closer look at is the Implementation Success. It guides you through the process of setting up the ground for a successful POC project. Even after going through several Synapse POCs & projects, I already spotted some additional things to consider.

The content is provided and maintained by the Azure Synapse Customer Success Engineering (CSE) group.

Additional links:

Posted in Azure, Azure Synapse Analytics, Best Practices | 1 Comment

SQLBits 2022 – My Training Day about Data Governance and Azure Purview

I am happy. More than happy that I can share the news now…

Next March, in London, I am allowed to conduct a full day training at SQLBits 2022. About one of my highlight topics in the past months – Azure Purview and Data Governance.

The list of Training days is already published, registration will open early next year. If you are not sure if the content of the workshop titled Data Governance in a Nutshell โ€“ From Theory to Hands-On with Azure Purview is suitable for you, here are some details:

Data is important for every organization. To know your data estate (systems), their dependencies, the responsible people and processes takes your company’s data literacy to the next level.
This topic is called Data Governance, which includes two main streams: a) the organizational processes and policies and the b) the technical tools to support the organizational processes.

Many of us think, that Data Governance is a technical problem to solve, but I would start the Data Governance day with an explanation what DG is. We will talk about the people involved, the processes needed, policies that should be in place and the technology so support the processes.
The overall Data Governance vision and strategy will also be discussed.

After the DG theory part, we will dive into a Data Governance tool – Azure Purview. In the Hands-On lab we will connect to data sources, configure scans, see how Purview classifies data, assign and work with a business glossary, work with & analyze data lineage and see the Purview data catalog in action.
In the last part of the day, we will see Purview integration in action – the Apache ATLAS API will be presented as well the integration into Azure Data Factory and Azure Synapse Analytics.

At the end of the day, attendees should have a feeling about what Data Governance is, the organizational requirements as well as a first look into Azure Purview and how a technical answer to these questions can look like.

https://sqlbits.com/information/event22/Data_Governance_in_a_Nutshell__From_Theory_to_Hands-On_with_Azure_Purview/trainingdetails

In short: the day will guide you through the foundation patterns of Data Governance, the way how Azure Purview can solve (some) of the challenges for Data Governance and get you an idea how you and your organization can start/move further on with Data Governance.

The full list of training days can be found there: https://sqlbits.com/information/trainingday

Posted in Azure, Azure Purview, Data Governance | Leave a comment

Complete Analytics Fabricย – Azure Synapse and Power Query (Power BI) better together

As a follow up of todays PASSDataCommunitySummit keynote, in which Arun Ulag mentioned a new phrase – the Complete Analytics Fabric, I would like to write more about a new option to really get a complete picture of your Azure Synapse Analytics artifacts in your Synapse workspace in one got – in one Power BI dataset – with one connection option only. Sounds interesting? Yes.. it is a nice way..

The new Azure Synapse Analytics Workspace Connector for Power Query

Announced at Ignite 2021 (source: https://techcommunity.microsoft.com/t5/azure-synapse-analytics/supercharge-bi-insights-with-the-new-azure-synapse-analytics/ba-p/2929352) , the new Azure Synapse Analytics Workspace Connector allows you, as a Power Query / Power BI user to directly connect to all the data artefacts in a Synapse workspace. Prior to this new connector, you needed to connect to the SQL Pools and/or Spark database separately.

The November Release of Power BI Desktop includes a preview version of this new connector (Nov 2021 release blog post)

Let’s see how this connector works and which options you get by using it. In my demo example, I am using Power BI Desktop and as a first action, use the Get Data dialog to connect to a Synapse Workspace.

There are (as of today) two options to connect to Synapse Analytics – the “old” Azure Synapse Analytics (SQL DW) connector and the new Synapse Workspace connector. Select the new one and move on to the next step..

The new Azure Synapse Analytics workspace connector

Which is a warning that this is a development connector and third-party software is used to connect. I’ll continue …

The warning when using the connector the first time

Up next – Authenticate using your AAD user that has access to a Synapse Analytics workspace.

The next dialog lists all the Synapse workspace your user has access to

The list of Synapse workspace the authenticated user has access to

Whenever I expand one of the workspace nodes, the available data artifacts (databases) are listed. In my demo workspace, the following objects are available. The question that comes up now is: What are all these objects? SQL Dedicated Pools? Serverless databases? Spark databases?

The data artifacts in my Synapse Workspace

If we compare the list in Power BI Desktop and the objects in Synapse Studio, it will get clearer which objects are now available in the Power Query connection dialog.

Which objects are available in the Power Query connector?

As of today, Spark databases (now called Lake databases in Synapse Studio), SQL Serverless databases and Dedicated Pools are available for a connection.

The Synapse Analytics workspace connector now allows you to select tables (objects) from different sources and them in one connection and data access action.

Select different tables and use them in Power BI

In my example, Power BI did not ask if I want an Import or Direct Query connection – Import mode was chosen for me.

In my opinion, this new connector really makes our life as Synapse and/or Power BI developer easier – only one connector to use them all. Let’s see if the list of supported artifacts is extended (like it is mentioned in the announcement blog post) – this really would be very nice: SQL Pools, Spark databases and data lakes, and even linked services connected in one go.

Happy data set creation!

Posted in Azure, Azure Synapse Analytics, Business Intelligence, PowerBI | Leave a comment