In the world of data analysis and business intelligence, the part of preparing your data to work with, is seen as the most time-consuming task of any BI project. There are numbers that mention up to 80% of a total project budget for the ETL (extract, transform and load) works.
In the past, these ETL tasks were mainly driven by IT professionals that produced a huge amount of data transformation logics. In the Microsoft ETL universe, SQL Server Integration Services (SSIS) packages/projects and T-SQL logic were the main choices for solving these requirements.
During the last few years, the trend points into one direction: Self-Service BI! And self-service BI is mainly driven by business analysts and key users from the departments without a solid knowledge about data modeling and data preparation. Microsoft addressed these challenging task by introducing PowerPivot (the BI modeling tool integrated into Excel). A completely new story – new data modeling approach and a new formula language – DAX. The self-service trend for reporting and data modeling was definitely addressed with Power Pivot, but for the ETL part no real solution was provided..
“Until now” (quote out of the “Dataflows in Power BI” Whitepaper by Amir Netz)
Introducing Power BI dataflows
This is where Power BI dataflows (at least now they have a name that did not change within the last few months – If you want to hear more about the story of this naming process.. ping me – I will happily talk about it.. :-))
Power BI dataflows provide the next step for re-using Power BI data preparation logic. In the past, data preparation implemented for Power BI datasets were limited to these datasets. No way of reusing logic from one dataset in another dataset!
Power BI dataflows are targeted to overcome existing data preparation limitations and will become first level citizens in the Power BI information hierarchy (figure taken from the whitepaper)
Power BI information hierarchy (source: Dataflows in Power BI whitepaper)
Dataflows read data from source systems, use Power Query to implement the ETL logic and store the results of these transformation steps in entities (you can also call them tables).
Amir mentions in the whitepaper five core principles that describe the main pillars of Power BI dataflows:
- Intuitive authoring: dataflows are created using Power Query- the tools that millions of Excel and Power BI users are already familiar with.
- Auto Orchestration: Dataflow authors do not have to think about the right order of dataload, dataflows are supposed to handle it in the right way (let the future prove if this is right).
- Big data in a data lake: dataflows are designed to work with huge amounts of data. Results of the transformation are stored in Azure Data Lake Storage Gen2
- Common Data Model (CDM): dataflows support the Common Data Model – you can use standardized entities, extend them or create your own entities to store dataflow results. See my blog post for an explanation of the CDM!
- Native integration into the Power BI system AND Azure Data Services:
- Dataflows are a native artifact in the Power BI ecosystem like datasets, reports and dashboards (see figure above)
- Announced in December, dataflows can now be even more deeply integrated with other Azure data services. You can choose to store dataflows data in your Power BI environment OR you can select to use your own Azure Data Lake Storage. And with the second approach, you can exchange data and work together with other data analysts/ data scientists!
Is it a Premium only feature? NO – it’s PRO and Premium!
Power BI Dataflows are available for both – PRO and Premium users. Nevertheless, there are differences in the feature set – as usual the more enhanced and fancy features are only available in Premium capacities (i.e. Incremental refresh, referenced entities, …)
The following table is again taken from the whitepaper (checked again as of 2018-12-21):
It is.. 🙂
If you want to read more about it, here are some links:
Be (data) prepared and let your data flow!