The future of Microsoft Analytics is here – Welcome Microsoft Fabric!

Look back into the past of Microsoft Analytics

Do you remember those good old times where it was “just” SQL Server DB-Engine, Integration Services, Analysis Services and Reporting Services and your Data Warehouse solution was done?

And then – we moved into the cloud with a huge variety of services to build analytics solutions – Azure Data Factory, Azure Data Lake Storage, Azure SQL Database and Power BI. Different ways of combining these puzzle pieces were possible and when we look back at those cost estimation discussions with customers.. That wasn’t an easy task.

And then in 2019, the preview of Azure Synapse Analytics was released. I really liked the idea of an integrated analytics solution combining different ways of storing data (data lake and data warehouse) plus options for (almost) everybody to work and transform data (SQL based engines, Spark, ADX). But again the management and especially overall pricing was not easy to estimate. Every piece of Azure Synapse Analytics was billed individually and every compute engine used a different format of storing data. Spark could use data lake storage and/or dedicated SQL Pools, Data Explorer uses Data Explorer database format and SQL engines use their own storage model.

Just some thoughts about an ideal analytics solution…  

  • What if you could start an analytics solution without worrying about costs of the different pieces in that context (compute and storage aside)?
  • What if you could log into a single portal and work on all your data tasks without switching tools?
  • What if you could re-use your knowledge about current Azure Data services like Azure Data Factory, SQL Serverless, SQL Dedicated pools, Spark and Azure Data Lake without the need to learn “yet another analytics tool”?
  • What if this analytical solution is provided to you as a Software-as-a-Service product where you do not have to worry about instantiating analytics runtimes, remember to providing storage services, … !?
  • In the past, even within an integrated analytics solution, data was copied from one place to another. Think about a Data Lake as archiving/staging layer, a SQL Dedicated Pool as data warehouse and Power BI as semantic and reporting layer. Between every of those stages, data was duplicated.
  • One of the reasons why data was duplicated between the different compute and analytic engines was based on their different (internal) storage formats. What if an analytic solution has one format for all their different analytic workloads? And image a world where results produced by engine #1 can directly be accessed and used by engine #2 or #3?
  • Last but not least, what if an analytics solution provides you end-to-end security, data lineage and compliance under a single umbrella?

The future of Microsoft Analytics is here – Introducing Microsoft Fabric!

Announced at MS Build 2023 (announcement post), Fabric brings the Microsoft analytics stack and products to the next level. At Build 2023, the public preview of this services was announced.  

“Microsoft Fabric is the most important milestone in the history of Microsoft Data since the introduction of SQL Server”

Satya Nadella at MS Build Keynote 2023

Starting from the past with a huge set of different tools to build your solutions, Fabric brings them all under a single umbrella. The overall idea and purpose is to provide “end-to-end analytics from the data lake to the business user”.

If you are more into videos – I’ve recorded a short Fabric summary video -> https://youtu.be/FEzQnJFUvx4

Let’s dive a little bit deeper into Microsoft’s Fabric main pillars.

  • Data Storage – OneLake to store all your information in one place and using – by default – one open standard storage format.
  • Compute workloads for different personas to work on every aspect of analytics – from data integration, data engineering, machine learning up to reporting.
  • Structure your data in workspaces and share these data pieces across them in your organization.
  • Re-use already well-known and well-established Microsoft data services without the need for learning new tools and integrate them into a single Software-as-a-Service analytics platform.

Data Storage in Fabric – OneLake

OneLake is THE storage location in the Fabric scope. With the SaaS approach, OneLake is a single unified logical data lake for your whole organization. You do not have to worry about instantiating and configuring storage accounts – it’s done in background for you. Even if your organization is distributed around the globe, OneLake helps in this case and provides storages accounts in different regions around the world. Above that technical details, OneLake combines it into one single logical data lake.

Data is stored using open standard Delta format (parquet files plus transaction logs) which allows every tool that is able to read and write Delta format to consume the data within Fabric. It’s not only the data lake based workloads that store their data in Delta format, it’s also data warehousing and even Power BI that directly interacts with the Delta based data in OneLake.

When talking about open access to data, OneLake is based on Azure Data Lake Storage Gen2 technology and therefore all your already in place API calls & tools will continue to work with OneLake too.

Structure your OneLake with Workspaces

Workspaces are the concepts to structure your OneLake – every workspace is assigned to a Fabric capacity and – as of now – the level of defining security and access.

Workspaces themselves are not closed data silos. With the use of Shortcuts you can provide – like symbolic links in your file systems – connections to other data pieces in OneLake itself. This clears the way for the one copy of data and no duplication of data in your analytics solution. Shortcuts can even be extended to other services outside of Microsofts data universe like Amazon storages.

OneLake in Microsoft Fabric – THE central storage based on open standard, Delta (parquet) format. Use Workspaces to structure your OneLake and Shortcuts to extend to other storage systems (internal or external)

Compute Workloads in Microsoft Fabric

Based on the one place to (not only logically) store your data in your organization – the OneLake – there are different tasks, your data needs to be worked on.

First, you need

  • to integrate and collect your data,
  • transform and engineer your data  (“massage your data” as some of those Guys from a cube named it 😉)
  • and maybe then build a data warehouse on your data
  • sometimes – you learn more about your data with using Data Science.

In some projects, your data flows in real-time and needs to be analyzed in real-time too.

Besides that, every data pool without a proper reporting and business intelligence is not complete.

In Fabric all those different workloads are based on OneLake data. No different ways of organizing data, no need to transfer data from one to the other system. Every workload can directly read the output of another workload.

Workloads in Microsoft Fabric (as of Build 2023); some are available in public preview, some are coming soon.

So what workloads are there in Microsoft Fabric?

Microsoft Fabric – Home site
  • Data Integration is based on the well-know Azure Data Factory technology and Data Flows you maybe know from Power Query online. Both ways are there to read, transform and bring data into the Fabric OneLake (workspace).
Start your data integration journey in Microsoft Fabric
  • Data Engineering is based on the same Spark engine and technology you are already familiar with. One of the main concepts in Fabric is to build a lakehouse for all your organizational data. And that data in your lakehouse is not hidden and limited to Spark only. Because of OneLake and the Delta format used by every workload the transformed data tables can be used by all the other workloads. Even by default the generated Delta tables in OneLake are discovered & registered to provide an integrated file to table experience.
  • Data Warehousing. Think about Azure Synapse SQL Dedicated Pools on fire. Using Delta tables as storage format and a serverless engine with auto-scale to do the compute work. In addition, use T-SQL to work with your data in the way as you are already familiar with.
Use SQL to build your own Datawarehouse. Distributed performance based on OneLake data
  • Data Science. Spark, MLFlow, Cognitive Services and Notebooks are already your friends. Good – because also in Fabric these technologies will continue to exist and be the foundation of your Data Science work. Expect a deep integration (because of OneLake and the overall Fabric umbrella) but also some exciting extensions for a better, enhanced and more productive way to do Data Science.
Work in notebooks as you know them, but deeply integrated with all other parts of Microsoft Fabric.
  • Real Time Analytics. Real-time analytics has become important in many scenarios in the enterprise world, such as Log Analytics, cybersecurity, predictive maintenance, supply chain optimization, customer experience, energy management, inventory management, quality control, environmental monitoring, fleet management, health, and safety. The part of streaming, real-time data  is handled by the former known Azure Data Explorer (ADX/Kusto) technology. But this time, directly integrated and based on the same storage format. Data in your real-time context is automagically mirrored to your central lakehouse – no data copy is needed here.
Real-time analytics directly integrated into Microsoft Fabric
  • Business Intelligence. In Fabric, a semantic layer (aka. Power BI dataset) on your lakehouse is generated by default. Yes – by default and automagically. Besides this, this dataset is not using Import or DirectQuery mode, it uses a new connection mode – the Power BI Direct Lake mode.
    • Direct Lake mode – By using this connection mode the best parts of Import and DirectQuery mode are combined – fast dataset performance and up-to-date information. In Direct Lake mode, Power BI directly reads data from OneLake whenever this part of the dataset is requested. In order to return data from OneLake fast, these Delta tables are optimized (VOrder) but still 100% compatible with the open standard Delta format (and thereby can be consumed by all the other applications).
Power BI as you already know it (but look close) – all integrated and based on OneLake data

No workload without compute power (capacities)…

As Microsoft Fabric is a SaaS product, you do need to instantiate and administer the different workload components as you had – for example in Azure Synapse or in a “plain” data solution based on Azure services.

But you need one thing to start with a Fabric capacity that powers the Universal compute capacities.

A workspace in Fabric needs a capacity assignment because storage and compute are separated in this environment. Therefore no data transformation, analysis, warehousing is possible without a capacity assigned.

As to my knowledge, there will be capacity in different performance levels and they will be available on a subscription basis.

You need Fabric capacities to power your analytical processes.

To sum it up…

Puh.. I think we all need some more time to digest these announcements and the options that the new Microsoft Fabric stack brings with it.

With the public preview announced at Build we all get the chance to try Microsoft Fabric on our own workloads (head over to https://aka.ms/try-fabric), learn how to interact with and especially design solutions based on best practices. Keep in mind that it is a first public preview version and therefore expect some hickups, learning curves, maybe some (small) steps back but with our combined feedback we can shape the Analytics Platform of the future.

I am more than excited about the new possibilities and look forward to start some projects based on Fabric in the near future.

Some Links

MS Build Content

Introducing Microsoft Fabric:

Advertisement
Posted in Azure, Azure Synapse Analytics, Conference, Data Lake, InformationSharing, Microsoft Fabric, PowerBI | 1 Comment

How to enable Microsoft Fabric in your (Power BI) tenant

During Microsoft Build 2023, Microsoft Fabric has been announced in public preview. If you want to try Fabric in your tenant, you need to enable the Fabric features in your Power BI admin portal.

  • Head over to Power BI service (https://app.powerbi.com) and open the Admin portal (you need Power BI admin permissions to see that menu entry)
  • By default, Microsoft Fabric is disabled (as of now – if you do not change the setting, it will be set to ON after July 1st 2023).
  • In order to use Fabric, we need to enable that setting.
  • In my case, it only needed a refresh of the browser window – and the Fabric UI loaded.. really.. it still looks like the same… hmm

There are some hints in the UI – the URL has a parameter at the end ?experience=power-bi plus on the lower left corner you see the Power BI icon. This icon is the way to change between the different Fabric workloads.

Et voila… we are done and we’ve enabled Microsoft Fabric in our tenant. Easy right?

Now we only have to learn more about Microsoft Fabric in general, the building blocks, the technologies involved and licensing & capacities. Be patient, the internet will bring us many blog posts, articles and videos about Fabric in the near future.. I am sure.. 🙂

Build a happy data fabric,

Wolfgang

Posted in Microsoft Fabric, PowerBI | Leave a comment

Microsoft Build 2023 – Book of News including Microsoft Fabric announcement

As with every big conference, Microsoft also published a Book of News for the Build 2023 conference

Book of News: https://news.microsoft.com/build-2023-book-of-news/

Even when Build is mainly focused on a developer audience, this year Data and AI got their places in the announcement list.

My personal highlight is the public preview of Microsoft Fabric – a unified end-to-end SaaS analytics solution. Satya Nadella called it the most important milestone in the history of Microsoft data since SQL Server. I think this somehow emphasizes the importance and potential of this data suite.

Expect more blog posts & videos about Microsoft Fabric soon.

Posted in Azure, Azure Synapse Analytics, DataNews, Microsoft Fabric, PowerBI | Leave a comment

Microsoft Purview and the Apache ATLAS API

This week, I had the pleasure to present for the Hybrid Virtual Group about how to Extend and Customize your Microsoft Purview account with the Apache ATLAS API.

I really enjoyed presenting again for this group and thanks for everyone (the organizers and the attendees). There were great questions afterwards and some of you reached out to find more information about that topic. As promised, I will share the slides of my presentation here

Please download it here and if you have any additional questions, feel free to reach out!

Govern your data and #treatYourDataBetter

Wolfgang

Posted in Azure Purview, Data Governance, Microsoft Purview | Leave a comment

(Data/Power BI) Governance Sessions at SQLBits 2023

This week is SQLBits 2023 week. I will start my journey tomorrow and in preparation I had a look at the schedule to find some interesting sessions in the Data / Power BI Governance area.

There is a huge list of sessions and even training days in the context of data governance planned in this years conference.

Training Days

  • Building an end-to-end and open solution to monitor and govern your entire data estate (Dave Ruijter & Marc Lelijveld)
  • Govern your Power BI environment (Hylke Peek)

General Sessions

Thursday

  • Power BI governance, disaster recovery and auditing (Alex Whitles)
  • Power BI Governance quick start (Asgeir Gunnarsson)
  • Managing access to data sources in your data estate with Microsoft Purview (Erwin de Kreuk)
  • Integrate Data Quality into your processes (Tillmann Eitelberg, Oliver Engels)

Friday

  • Unity Catalogue and Purview: Data Governance Bedfellows (Barny Self)
  • Govern your self-service integration process (Tillmann Eitelberg, Oliver Engels)
  • Business Benefits of Good Governance (Victoria Holt)
  • Maximize the business value of data with Microsoft Purview Data Governance (Blesson John, Gaurav Malhotra)
  • Microsoft Purview Data Governance: Updates & Roadmap (Gaurav Malhotra)
  • Build your trust in your data. Why data governance is the key to success (Johan Ludvig Brattas, Marthe Moengen)
  • Meet the PG: Microsoft Purview (Gaurav Malhotra)

Saturday

  • Extended Governance of Power BI with Microsoft Purview (Oliver Engels, Gabi Münster)
  • The Power BI secret sauce: Security and governance (Kasper de Jonge)

I hope to attend as much as possible of those sessions. If you are attending SQLBits too, I am happy to connect and chat! Just stop me..

Treat your data better and happy conference,

Wolfgang

Posted in Azure Purview, Conference, Data Governance, InformationSharing, Microsoft Purview | Leave a comment

One Way to Try Microsoft Purview (Data Governance) for Free – BUT…

One of the most complained things about Microsoft Purview Data Governance is the missing of a development/trial option. The pricing of Microsoft Purview is based on different usage types – including Data Map Population, Data Map Enrichment and Data Map Consumption.

Quite a price for “just” a demo/test environment

Erwin (t | b) blogged about the different pieces that form the total pricing (blog post: Updated Microsoft Purview pricing). Erwin also includes a pricing example and at the time of his writing, the minimum costs were set to (a minimum requirement) of 1 CU (=Capacity Unit) for the base data map itself (not including any scans or report generations) => costing 284,7€/month. When you add the costs for scanning, insights generation and advanced resource sets, this could easily sum up heavily. And was not suitable for a small demo environment.

But something changed, you can now try Purview (almost) for free

It was the blog post describing a “Low-cost solution for managing access to SQL health, ..”. That blog post mentions a completely new information to me – that the Microsoft Purview Data Map Consumption is free of charge for a metadata storage size below 1 MB. I checked the Purview pricing page and yes – the included Quantity with 1 MB is mentioned there (OH “The first 1 MB of Data Map meta data storage is free for all customers”).

And my reaction was – Nice, very nice.. I can try and create Microsoft Purview instances for free and test new features..

BUT: I wanted to be sure and check, how much metadata (sources, scan results, data assets, classifications) can fit into 1 MB of metadata.

Let’s try it – create a new Purview account

I immediately went over to the Azure Portal and created a new Microsoft Purview account – let’s name it purviewSizeTest.

And feed the data catalog with some demo data. I went for AdventureWorks as it is a small demo database including some tables. If you do not know how to get Adventureworks sample data into an Azure SQL Database, Koen (t) wrote a nice how-to post (https://www.mssqltips.com/sqlservertip/7565/adventureworks-database-installation-azure-sql-database/).

I created two sub-collections in my data map and registered the Azure SQL Database (including a database with AdventureWorks) in my data map.

Next, I defined a scan definition referencing the system defaul scan rule set for Azure SQL (I configured the scan without Lineage extraction) and ran the scan once.

The scan itself ran for ~7 minutes and discovered and ingested 26 data assets into the data map.

How to check the Microsoft Purview Data map size

Now the big question was – how much meta data was produced by these two sub-collections, the Azure SQL source, the scan plus the 26 data assets discovered?

You can check your Data Map Storage Size in the Azure properties (Overview) of your Purview account. In my case the storage size was not reflected and updated immediately – I think it took about 6 hours that the storage size was displayed here. And guess what, the storage size was < 1 MB! 😉

So it is free, right… !? hmm.. is it really free?

After the Storage size check, I needed to check the pricing. And guess what.. the test was (almost) free. Why only almost and not free as I mentioned above?

Well, the data map consumption is free for the small set of meta data, but scans still cost you some € / $. In my case the scan of my Adventureworks database lasted 1.08 vCore hours and cost me ~0.68 € (pricing is 0.598€/1 vCore Hour).

And even after a week of testing, there is no additional pricing entry in my Azure costs for that specific Purview account.

Sounds nice, BUT …..My findings are:

  • You do not get a completly free Microsoft Purview account – Only Data Map Consumption for a small (really small) set of meta data is free (the first 1 MB is free).
  • Data Map Population (Scans, ingestion, classification) and Data Map Enrichment (Insights Generation & Advanced Resource Sets) are NOT free.
  • Purview Applications could also cost you some money (most of them are still in preview right now and free).

=> Be aware, you do not get a completly free Microsoft Purview instance (initial costs for the scan) but you can create an environment containing an Azure SQL source + AdventureWorks database to demo Microsofts Purview features with no running costs.

Pricing example based on the current Microsoft Purview pricing as of 2023-02-09.

Posted in Azure Purview, InformationSharing, Microsoft Purview | 1 Comment

Data Community Austria Day 2023 – Wir leben noch!

Usually, January is the month when we traditionally held our #SQLSatVienna / #DataSatVienna or the #DataCommunityAustria Day. It’s been a long time (a year) since we got together for a big data event in Austria.

Although the covid situation in Austria and around the world got different compared to the years 2021 and 2022, we decided to NOT have an in-person event in January 2023. But we did not wanted to let the #DataCommunity in Austria be alone and wait for the next event to come (or not)

So we got together, asked some of our speaker friends around the world and organized another edition of the #DataCommunityAustriaDay! It will happen on Friday, 13th Januar 2023, starting at 9am CET.

  • 12 sessions
  • 2 tracks
  • 12 speaker
  • Fun, come together, learn and share

You can find more information at our UG webpage: https://sqlusergroupaustria.wordpress.com/2023/01/05/data-community-austria-day-2023/

Register here: https://www.eventbrite.at/e/data-community-austria-day-2023-tickets-505907903157

Posted in Azure, Conference, Data Community, PowerBI | Leave a comment

Microsoft Purview in the Year 2022 – A Recap Tour

A lot happened in the Microsoft Data Governance area, especially in the area of Microsoft Purview Data Governance.

Let’s go back in history and wrap up the announcements that have been made in Microsoft (Azure) Purview area. My main source for this summary is the Security, Compliance and Identity Blog.

TL;TR

There is a video summarizing the news

Remember: In the beginning of the year 2022, Microsoft Purview as still named Azure Purview.

January 2022

February 2022

March 2022

April 2022

May 2022

July 2022

October 2022

November 2022

December 2022

Quite an impressive list – What was your favorite feature of Microsoft Purview in 2022?

Happy data cataloguing & treat your data better!

Wolfgang

Posted in Azure Purview, Data Governance, Microsoft Purview | 1 Comment

Ignite 2022 – Azure Data Platform Update

It’s the week of Ignite 2022 and as in the last years, many new features and updates have been announced. It’s hard to catch up with all the tweets, blog posts and similar things so I will recall the good old tradition of the Ignite Book of News.

https://aka.ms/ignite-book-of-news

This page lists all the announcements in one place, so head over there and browse through the list. If you do not want to read, you can have a look at the 8 minute summary video I recorded to talk about the Azure Data news.

Have fun and #treatYourDataBetter

Wolfgang

Posted in Azure, Azure Purview, Azure Synapse Analytics, Business Applications, Business Intelligence, Cloud, Conference, DataNews, Microsoft Purview, PowerBI | 1 Comment

Microsoft Purview and Data Governance at PASS Data Community Summit

Yesterday the session catalog for the hybrid PASS Data Community Summit 2022 was released. I am now allowed to talk about my sessions that got selected for this event. It looks like there is a little bit of Data Governance and Microsoft Purview focus 😉

  • Ask us Anything – We (Victoria, Erwin, Richard and myself) will be on stage and answer your questions about Data Governance and Microsoft Purview. If you have any questions you want us to answer, please submit them here: https://forms.office.com/r/dTP38LnmsJ
  • When the Standard UI is not Sufficient – Microsoft Purview & Apache ATLAS API. Many use cases in Microsoft Purview are available in the web UI, but sometimes you need to make more (i.e. add custom lineage/properties or query the content of your data catalog). In those cases, the Apache ATLAS API can be used.

Registration for the event is open and the Early bird price ends at July 13th -> https://passdatacommunitysummit.com/

Govern your data and let’s meet at PASS Data Community Summit!#

Wolfgang

Posted in Azure Purview, Conference, Data Community, Data Governance, InformationSharing, Microsoft Purview | Leave a comment