Connectors in Power Query (reference link)

Sometimes even the internet has some real treasures hidden and I found on this weekend.

In many workshops my customers ask me – “Wolfgang, is connector to system XYZ supported in Power BI datasets AND dataflows?” .. and in some cases I can answer, in some cases an internet search helps me to find the answer.

But now – I found the overview page for Power Query connectors, listed by source system AND service support. And this blog-post is for future wolfi to remember this page! 🙂

List of all Power Query connectors | Microsoft Docs

What I really like is the possibility to get the right information in a short look. Search for the connector – et voila. You’ll know if they are supported in

  • Excel
  • Power BI (datasets)
  • Power BI (dataflows)
  • Power Apps (dataflows)
  • Customer insights (dataflows)
  • or Analysis Services
screenshot from the documentation page taken from https://docs.microsoft.com/en-us/power-query/connectors/

Posted in Azure, Business Intelligence, Cloud, InformationSharing, PowerBI | Leave a comment

Data News – 2021-01-18

Welcome to 2021! Herzlich Willkommen in 2021!

Welcome to another episode of my #DataNews series. In the past week I found some interesting stuff I find you should have a look at.

After the end-of-year break, let’s start into the data-year 2021!

Last Friday was the day of the last SQLPASS SQLSaturday – #SQLSatVienna. As in the last years, I was part of the organizing team and I think it was a successful day after all. SQLPass Global closed its doors on Friday, January 15th which was actually the day of our event. We had to switch the attendee registration process to a new site (eventbrite) and provide a new platform for the virtual sessions. We chose Microsoft Teams and it worked well. We had six tracks whereof each had six sessions => 36 sessions all over the day. We had attendees and speakers from around the globe – the maximum attendee count of one time slot was 270!! We are more than happy with the outcome.

We are planning more events throughout the day. To get the news update, come join #DataCommunityAT (Austria) and register for the newsletter to stay up-to-date: http://eepurl.com/hmkz49

DataSaturdays – The future of data events is already here

#SQLSatVienna 2021 was the last of its kind – the last SQLPass SQLSaturday in history. But the future already began. A group of community enthusiastic founded the datasaturdays intiative. There are already 7 events registered – have a look at the site and register your event and/or submit your sessions!

> Data Saturdays <

What happened in the data world?

In Austria, we are currently in full lockdown because of Covid-19 and therefore, I spend many evenings reading.. I’ve collected some interesting resources for you…

(Virtual) Events in the next weeks

Reading is fun, but (virtual) conferences too…

Have a good start into the week!

Wolfgang

Posted in Azure Synapse Analytics, Cloud, Conference, DataNews, InformationSharing, PowerBI, SQLPass | Leave a comment

Continuous Integration for the BI Team session at ExpertsLive Austria 2021

It’s my first conference in 2021 – and it was planned as a local one.. but as things changed the ExpertsLive Austria 2021 conference went virtual too..

I am speaking today about Continuous Integration for the BI Team:

In many cases and BI projects, the manual process of development and especially deployment of artefacts led to sleepless nights, emergency hotfixes, grey hair and many hours of stress.
Especially in BI development, the usage of database projects, unit testing, version control, continuous integration and deployment is an underrepresented area.
Join me in this demo-heavy session where I will guide you through the not-so-frightful jungle of continuous integration using SQL Server database projects, SSIS, T-SQLt and Azure DevOps etc.
After this session you’ll have the guidelines and some ideas how easy it is to change your manual BI development and deployment into a powerful and automated no-brainer that – just works!

If you want the slides, here they are..

If you have any questions, feel free to reach out to me!

Posted in Azure, Business Intelligence, Conference, PowerBI, SQL Server | Leave a comment

Azure Synapse Analytics end-to-end: From your database and sensors to a Power BI dashboard

In todays blog post I would like to build an end-to-end solution to combine data coming from different sources and stored in different form factors into a single Power BI data model using Azure Synapse Analytics.

The Basics to start with

For those of you, that do not know Azure Synapse Analytics, just a short explanation – taken from the official product documentation page: Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale.

So what does this mean? Well, in theory it should not really matter where your data comes from, in which format it arrives and how you like to transform and query it – Azure Synapse Analytics should solve this for you.. 🙂 In theory… It’s a little bit of work from your side to solve it.. but let’s have a look how we can deal with it.

Over time, two major streams for data in enterprises were created: 1) traditional data warehousing for dealing with well-structured data and reporting requirements and 2) big data analysis to deal with (huge) amounts of less structured data.

Use case 1) involves business users from the enterprises to work with the data warehousing data based on proven security, dependable performance and data stored in a relational structure. Data in use case 2) is often used by data scientists for exploration, big data analysis and machine learning based on semi structured data in a data lake.

In the past, different (Azure) services were needed to solve and deal with these different kind of analytic requirements, storage technologies and performance requirements.

With the announcement of Azure Synapse Analytics, the separate services are now combined under a single bracket – a so-called Azure Synapse Workspace.

source: Analytics in a Day slides

Within an Azure Synapse Analytics Workspace,…

  • ..you can integrate data using Synapse Pipelines (if you already are familiar with Azure Data Factory, it’s an easy go for you)
  • .. the central data storage is a based on an Azure Data Lake Storage Gen2 (needed for metadata storage)
  • .. you can use different kind of analytic runtimes to work with your data
    • SQL-based runtimes using either a serverless approach or a provisioned flavor (named dedicated SQL Pools and based on the Azure SQL Datawarehouse technology).
    • Apache Spark runtimes to use the distributed architecture, notebooks and language of your choice
  • .. security is integrated from ground up using Azure Active Directory
  • .. Synapse Studio is the development single-stop user interface to create artifacts and work with all objects in the workspace
  • .. you can configure multiple linked services and linked data storages to extend the data landscape
  • .. you can configure connected services (like Power BI or Azure Machine Learning) which – in the example case of Power BI – allow a direct creation of Power BI reports within Synapse Studio

And now: Action for an end-to-end “Data through Azure Synapse to Power BI” demo

In my example scenario I would like to analyze sensor-data originating from different machines in different location of our company. The sensor values are gathered directly at the machines and sent through Azure Stream Analytics into a directory in our data lake (this process is not covered in this example).

I am not going into detail how to create a new Synapse workspace and how to create new dedicated SQL Pools or managing the connections to linked services etc.

Data Exploration

In order to get a first feeling of our data, we can use Azure Synapse Studio and there the Data hub to start with our data lake exploration. The output of our data collection process stores the data into CSV files – one for every machine. An example input directory would look like this

Sensor data in our data lake

Synapse Studio allows us to preview the content of CSV files – use the context menu and the Preview action. For a first look, Synapse Studio now opens a dialog with a preview of the file content

Data file preview in Synapse Studio

But we want to work with the data and query the file content. To achieve this, we can use the context menu again and select New SQL Script -> SELECT TOP 100 ROWS. This action opens a new SQL script window containing a script to query the CSV file from our data lake storage.

The magic in this script is based around the keyword OPENROWSET – this function reads the CSV content and returns the data in a relational structure. In the first screenshot the relational structure is not that kind of visible – only one column C1 is returned by the statement. What needs to be done are some additional parameters and tweeks to get a more reliable structure.

Also, this script is executed against / using the SQL Serverless pool (named Built-in). The source for this query is one single file – SensorValues_M_01.csv in the data lake.

An enhanced version of the SQL script deals with some of the points mentioned above:

SELECT
    *
FROM
    OPENROWSET(
        BULK 'https://<ourdatalake>.dfs.core.windows.net/data/staging/sensors/SensorValues_M*.csv',
        FORMAT = 'CSV',
        PARSER_VERSION='2.0'
        , FIELDTERMINATOR = ';'
    ) 
    WITH (
        MeasureDT       DATETIME2(0),
        MachineName     VARCHAR(20),
        SensorKey       VARCHAR(20),
        SensorValue     DECIMAL(18,2)
    )
    AS [result]

In this version of the SQL script I’ve changed the name of the source file to use a wildcard – i.e. to read all the files in the /staging/sensors/ directory following the specified file name pattern.

Second, the option FIELDTERMINATOR =’;’ was added to OPENROWSET -> this returns the four columns from the CSV file.

Third, by using the WITH clause I declared a relational structure for the statement output – defining column names and datatypes.

Data Import – Sensor Measurements

SQL Serverless pools in Azure Synapse do not store data by themselves, they are metadata containers for external table and view definitions.

What I would like to do for the next step of our example – store the sensor values in a relational store we need to switch to Dedicated SQL pools. For the import we have at least two choices – the first: use Synapse Pipelines to load the data from the data lake.

Second, use the bulk copy feature that Synapse Studio brings out-of-the box. Back in the Data Hub, I select a data file present and in the context menu the action New SQL script -> Bulk load is the door opener to the data integration step. After the wizard-like steps are finished, a COPY SQL Script is generated.

Select the Bulk load option to generate a COPY statement

First, select the input files / folder – i.e. the content you want to import. I was not able to select an input folder – Synapse did not like this selection and gave me errors. So I decided to go for a single file (the _M_02.csv file)

Select the source files/folder

Next, configure the input settings – in my case the field terminator ‘;’

configure the input file format

The sink of our COPY statement – a dedicated SQL pool and a new destination table – staging.SensorValues.

and specify the target information (database and table)

Just to be sure, have a look at the column mapping.

and have a look at the column mapping

I’ve changed the generated SQL script to provide some more meaningful column names and to read in not only one file but the content of the folder (using a wildcard).

the SQL script to COPY data into the dedicated SQL pool

Our current dataset looks as follows:

initial sensor data in our SQL Pool

Data Import – Masterdata

In addition to our sensor measurements, I want to include some masterdata information into our data warehouse database. The masterdata list for our machines and detailed information for our sensor-measurement keys are stored in an Azure SQL database.

Masterdata in an Azure SQL database

For the data import, a Synapse Pipeline using two copy data activities was created. For each of the sources, an integration dataset was created pointing to the according database table. The sinks for the copy activities are both pointing to the SQL dedicated pool.

To test the data import, use the Add trigger -> Trigger Now option in the menu.

Trigger a pipeline run

And the Monitor hub to monitor the pipeline execution run.

Monitoring of the pipeline run

Data Modelling (in the SQL Pool)

Up next – combine the data islands (sensor measurements and masterdata) into a logical layer to serve as input for the Power BI data model.

Using Synapse Studio and the Develop hub, I’ve created Views to create the logical layer.

CREATE VIEW [reporting].[SensorValues]
AS SELECT 
    [MeasureDT]
    ,[MachineID]
    ,[SensorID]
    ,[SensorValue]
    , MachineID + '_' + SensorID    MachineID_SensorID
 FROM [staging].[SensorValues];
GO

CREATE VIEW [reporting].[Sensors]
AS SELECT 
    [MachineID]
    ,[SensorID]
    ,[SensorName]
    , MachineID + '_' + SensorID    MachineID_SensorID
 FROM [masterdata].[Sensors];
GO

CREATE VIEW [reporting].[Machines]
AS SELECT 
    [MachineID]
    ,[MachineName]
    ,[Location]
 FROM [masterdata].[Machine];
GO

And the result looks like:

Create a Power BI Data Model

As the next step, a Power BI data model is needed for our visualizations. As I mentioned in the beginning, with the use of connected services the boundaries of a Synapse Workspace can be extended and external, Azure / Office 365 services can be integrated in Synapse Studio.

In order to do so, switch to the Manage hub and add a new linked service (type Power BI). Select an existing Power BI workspace, name it and that’s it!

New Linked Service of type Power BI

As of today, only one Power BI workspace can be linked to a Synapse workspace. I hope this limitation will be lifted in the future..

Only one linked Power BI workspace is allowed as of today

At the Develop hub, the Power BI workspace appears in the Power BI section: datasets and reports already existing are listed here. It does not matter if these items were created in Power BI directly or in Synapse Studio.

Power BI workspace integration in Synapse Studio

Open the datasets list and add a new one.

Download the pbids file to open it in Power BI Desktop

Open the downloaded PBIDS (Power BI Datasource file) in Power BI Desktop. The connection to the dedicated SQL Pool is automatically established. Select the three views we created in one of the previous steps and load them into the Power BI model.

Select the views and load them into the Data Model

For this demo, we will use Import mode. If you want to see a demo using DirectQuery a Data Lake directly, please have a look at my blog post here.

Use Import mode

Update the relationships between the tables and the data model looks like this:

The Data Model

I’ve created one sample report page:

Demo Power BI report

Save the PBIX file and publish it to the linked Power BI workspace.

Publish process

After the publish is finished, switch back to Synapse Studio and use the Close and refresh button.

There are two options now – use the published dataset to create a new report or open the report directly from Synapse Studio:

Power BI integration in Synapse Studio

The Power BI editing experience is fully integrated into Synapse Studio – change the report as you like. Be aware that all the changes you perform on one of the sides (Power BI service or Power BI integration in Synapse Studio) will appear on the other side too 😉

With that, we are at the end of our end-to-end Synapse example. To recap,…

  • we’ve explored data files in the data lake directly in Synapse Studio using the preview option and the SQL serverless OPENROWSET command
  • we’ve used the bulk load wizard to generate a COPY statement to load sensor data into a dedicated SQL Pool
  • we’ve used Synapse pipelines to load masterdata from an Azure SQL databae into a Synapse dedicated SQL pool
  • we’ve created views in the dedicated SQL pool as a logical layer
  • we’ve created a Power BI data model based on the views
  • we’ve used the Synapse Power BI integration to open and edit the report

Posted in Azure, Azure Synapse Analytics, Business Intelligence, Data Lake, InformationSharing, PowerBI | 1 Comment

Productivity Tip: Shorten duration for events (in Outlook web) automatically

During the last months, my calendar was full of online-meetings which very often were planned one after the other. If it’s a half day block that resulted out of these meetings, it’s fine.. but sometimes I just wanted a break. And with meetings scheduled for full 30 minutes or 1 hour this was not really possible.

By accident I found a setting in Outlook web that will help me in 2021 to get some breaks between my meeting marathons.

The setting is called Shorten duration for all events and it leads to shorter meetings – you can specify meetings to be shortened by a certain amount of minutes.

I hope this tip helps you to get some (planned) breaks between your meetings.

The setting can be found in the Outlook settings -> Calendar -> Events and invitations.

Start productive, be productive!

Posted in Best Practices, InformationSharing | Leave a comment

Create a new Power BI report and dataset (simply in the service, without Power BI Desktop)

In December, the Power BI team released a new feature to create reports in Power BI service.

Which does not sound new, right? Since the beginning of Power BI, you could create a new report in the service.. but now, the report can be created without the need of an existing dataset upfront.

Let’s see it in action

There are two options (at least I found two) how you can use the new feature: One is in the menu on the left side (+ Create), the other on is the marked yellow button in the screenshot.

With both options, you get to the following screen:

You can choose to either 1) use a published dataset for your report or 2) start from scratch with data from the clipboard.

Option 1 – Select a published dataset

Option 1 – select a published dataset

Option 2 – use data from the clipboard

For my demo, I use the output of a SQL query which I copied to the clipboard.

My source data is coming from SSMS

In Power BI (Power Query Online), I pasted the data into the dialog.

New data dialog – data is pasted into it

Before we can use the data, it needs some refinement. In this case, I use the first row as headers, change the column data types and select another name for the created table. What you could also do is to add additional column (and data to the data table) – but I will skip that step here.

The final data table

The Create button starts the creation of a dataset (in the current workspace) plus a new Power BI report is created. The Power BI teams calls it Quick Edit mode which means, you can add more columns that should be analyzed in the report, change the default summarization of the fields

The report generated contains blocks for every fields selected (KPI visual and break downs for the other dimensions) and an overview table in the bottom of the page.

Quick Edit or Full Edit Mode?

The menu bar

The menu itself allows us to save the report or save a copy of the report. Plus, we can start to analyze the dataset using Analyze in Excel.

QuickEdit: If you only use the Save option, the report stays in the QuickEdit mode which means, everytime you open it afterwards, you get directly into the QuickEdit mode (select new measures and/or dimensions that are automagically added onto the report page).

Attention: If you select the Edit action, you’ll get into the full edit mode and the Summarize pane is not available any more. This is a one-stop decision (and as of today, no return to the QuickEdit mode is possible).

To be honest, there is a way to get back to QuickEdit mode if you mistakenly switch to full edit mode. Do NOT save the report at this time and leave the edit mode. Next time you open the report, the Summarize pane is still there.

Power BI even asks for your permission to switch the edit mode.

Do you really want to switch to edit mode? (No QuickEdit anymore)

Afterwards, the report is available for full editing – you can change all the properties of the visuals and even remove or add parts to the report page.

Full edit mode

Behind the scenes

Every time you use this feature a new data set is created in the workspace you started the report creation. If you saved your report, the report itself also appears in the object list in your workspace.

Dataset details – Refresh not possible

If you want to have a closer look at the dataset, the settings page does not really reveal more details. We cannot schedule data refresh …. because this dataset seems to have no data model connections.. hmm!?

No refresh for the dataset

Download the pbix

Next, I downloaded the pbix file based on the dataset and had a look at the Power Query transformations. The Advanced Editor provides us with the “magic” how the table / datamodel is created. Power BI / Power Query Online transformed the data from the clipboard into a compressed string and this string is afterwards used for the data table.

Advanced Editor for the “clipboard based dataset”
let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSAWNTAwM9AwMgw8jA0FLXwBCIgBynzOxUIOVcWlySn5tapOCfl6oUqxOtZATVZWiAqs0Yu7aQ8nywNmOgoDlItQmKLiOIruDcwzuKilKTig7vSEHRnVGUCrHWBGqtBRY7ceqG2m0KFDQ2ABPIuk3w6wZ7OBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [OrderID = _t, Quantity = _t, Amount = _t, OrderDate = _t, ArticleName = _t, CustomerName = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"OrderID", Int64.Type}, {"Quantity", Int64.Type}, {"Amount", type number}, {"OrderDate", type date}, {"ArticleName", type text}, {"CustomerName", type text}})
in
  #"Changed column type"

How can I change the data?

What if I want to change the data? I started with some google websearch and tried to modify the compressed string. But this was not the right thing I wanted to do on a saturday evening. And therefore I just hit the gear symbol in the Power Query Applied Steps section. 🙂 – et voila…

How to modify the source object

If you need to, you can change the data in this dialog and also add or delete some columns. When done, save the pbix file and upload it to the Power BI service.

Posted in Business Intelligence, Cloud, PowerBI | 1 Comment

Speaking at ExpertsLive.at about Continuous Integration for the BI Developer

The new year is only one day old and I am happy to announce that I will speak at the 2021 edition of Experts Live Austria conference on Friday, 8th January 2021

It will be an online event and I will speak about the Journey to Continuous Integration for the BI developer. My talk will feature Azure DevOps, SQL Server, SSIS, SSAS, Power BI and how easy it is to get some automation into your BI projects.

Experts Live Austria 2021 (sessionize.com)

Posted in Azure, Best Practices, Business Intelligence, Conference, PowerBI | Leave a comment

DirectQuery for Power BI Datasets – What does Chaining mean?

In my last blog post I introduced the new concept of DirectQuery for Power BI datasets. This feature allows you to extend and modify a (remote) published Power BI dataset with the help of a local model.

Conceptual architecture for DirectQuery for Power BI datasets (including the local model)

The local model does not contain a copy of the remote dataset but a reference to it. You, as Power BI developer, are able to extend the referenced model with new data sources (like the Excel file I used in my previous example) and/or extend the model with new measures, columns and so on. For a new data model, relationships between the two data islands can be created.

Dataset chaining

This approach is not limited to a one-hop: A Power BI dataset referencing another Power BI dataset using DirectQuery can be used as a source for another Power BI dataset.

Let’s imagine the following scenario:

  • Dataset #1: Sales containing Sales information plus describing dimensions (Stock Item, Date, City, Customer)
Dataset #1 – Sales data
  • Dataset #2: Sales with Employees references the Sales dataset plus enhances the model with an Employee list (Excel file)
Dataset #2 – Sales with Employees (referenced remote model plus Excel source)
  • Dataset #3 – Sales with Employees and Purchases – DirectQuery to Dataset#2 (Sales with Employees) plus DirectQuery to another Power BI dataset Purchases
Dataset #3 – DirectQuery to Dataset#2 plus DirectQuery to Purchases dataset

The overall picture of dataset chaining looks like the following:

Dataset chaining with DirectQuery references

Maximum chaining in Preview is three hops

In the current preview (December 2020), the chaining length is limited to three (blog post). If you try to build a longer chain, it results in errors. Not in Power BI Desktop – I could create a Dataset nr 4 file without a problem, but after the publish to the Power BI service, the report / dataset is not usable. As of today, there is no error message available. Maybe the Power BI adds a meaningful one in the future!?

Dataset #4 – Maximum chain length in preview reached.

Follow the chain – Data Lineage

Following my passion for data engineering and ETL processes, the data lineage feature in the Power BI service and Azure Purview are important information sources for my day to day work.

Let’s have a look at the Data Lineage support for DirectQuery datasets. And the result is, well – not the best looking one. To be honest, it’s also mentioned on the documentation page that there is work to do for better Data Lineage integration.

source for featured image: https://www.pexels.com/photo/sea-nature-sunset-water-119562/

Posted in Business Intelligence, InformationSharing, PowerBI | 1 Comment

A New Era for Self-Service BI: Direct Query for Power BI DataSets

Do you remember the times, when Business Intelligence meant to only have one single point of truth within your organization? And this single source of truth was your enterprise semantic model?

How often do you wanted to extend this central model with your own data? And how often did you extend the central data by downloading it into an Excel file and merging it with your new data?

Have you counted the number of times you copied and extended a Power BI data model to support your new needs and include your extended data?

These times are now gone – because the Power BI team introduced the concept of DirectQuery for Power BI datasets! yeah! yeah!?!?!?

But – I hear some of you say’in – this isn’t new – we could already live-connect or DirectQuery or how you named it – to a Power BI dataset (published to the Power BI service)! So this is not really new…

Live Connect

This is not really true, because in the past, you could only create a live-connection to an already published Power BI data set. And with the live-connection, you could only consume the existing dataset. No additional data sources could be added because in this scenario, Power BI (Desktop) served as a visualization tool only. The semantic model remained in the Power BI service.

Even the tooltip for the Power BI datasets connection told us, that we are going to create a live connection.

Create a live connection to a Power BI Dataset (from Power BI Desktop)

After the connection was made, Power BI Desktop (prior to the December 2020 release) looked like the following:

Power BI Desktop with a live connection to a Power BI dataset

In the status bar (1), you’ll find the information about the connection type (live connection) and the name of the dataset (Sales) and the workspace name (Blog Demos). The tables from the data model are displayed in the Fields list (2). Because of the live connection, all other options to add other data sources are not available (3). In this case, Power BI Desktop can only be used as a visualization tool – build reports based on the data set.

There are still some options to enhance the remote model – add measures, but that’s it.. no calculated columns, no hierarchies, no changes in the model itself.

The options available when live-connected to a remote Power BI dataset.

With the December 2020 release of Power BI Desktop, this approach changed. You are now able to change a live connection to a Power BI dataset (or an Azure Analysis Services connection) to DirectQuery mode. Which allows us, to enhance the remote model with new columns, tables, additional datasources and create relationships between the datasources.

Let’s dive deeper into this and look at the story together with a sample.

Let’s try it – Basic configuration

The DirectQuery support (for Power BI datasets) was introduced in December 2020 as a preview feature, which means you need to activate the option in Power BI Desktop to be available (File -> Options -> Preview features).

Power BI Desktop – Activate Preview Feature for DirectQuery

And now – Let’s create a composite model v2

After we’ve activated the option (and restarted Power BI Desktop), we are able to create a new report based on a published Power BI dataset. Again, select the dataset you want to work with and after the connection is achieved, Power BI Desktop (December 2020 and newer) looks like the following:

Power BI Desktop – December 2020 with the option to switch to DirectQuery

Please have a look at the menu bar – the icons to add more data sources are not greyed out plus there is a hint in the status bar: Make changes to this model.

And this hyperlink is the starting point into the future of Self-Service Business Intelligence! Power BI Desktop informs you, that a local model is required for DirectQuery to work..

Power BI Desktop warning before the DirectQuery switch

If you ask yourself, what a local model is for and how it fits to the overall structure, I will post a reference to the description of the Power BI roadmap item (read the entry here).

What I really like in that picture is the graphical representation of the data model extension. We have one (remote) Enterprise Semantic Model that gets extended by two local models. And these two models allow us to combine data from the two / three sources.

After you confirmed the dialog, Power BI Desktop changes the internal data model structure and changes the storage mode to DirectQuery (see the changed status bar).

In my example, my remote Power BI dataset is based on the Wide World Importers database and contains Sales information. In the Model View of Power BI Desktop, you can have a look at the structure of the remote data model.

The remote model

With DirectQuery mode enabled, we have all the option to extend, change and modify our data model. What we can do for example, is to add a column to calculate the line total (= Quantity * Unit Price)

Context menu options for DirectQuery models
Add a new column in a DirectQuery model

If you want to have a look at the calculation results you’ll see that there is no Data View option in Power BI Desktop available for this model. The data itself resides in the remote model and the local model is just here to add some more semantics and structures. If you want to test the calculation, add a matrix to the report page.

Testing the new column.. it seems to work

Let’s add some (Excel) Data

One piece of information is missing in our Sales data model – the list of Sales Employees is not included in the model. But – we have a list of our employees in an Excel file.

Power BI Desktop – Add an Excel data source

When you’ve done the required transformation steps, Power BI Desktop provides you with a warning, that potentially data from one source can flow into another data source (system). But why?

This is because Power BI needs some values from one part of your local data model (ie. sales person name) to use it as a filter in a query that relates into the Sales dataset.

Warning that data from one source can flow into another one

The local data model looks like the following screenshot after we’ve add the employees list. What is missing is the connection between our two data islands. What we need to do is to build a bridge (=relationship) between the Sales and the Employees table.

The local model (still disconnected data islands)

Easy as usual, just drag and drop the column (Salesperson Key) from one table (Sales) onto the target column (Employee Key in Employees table). Et voila – the relationship between the remote model and our local (imported) Excel data is established.

Local model – now it’s connected

Architecture

What happened behind the scenes? When we switched the model from live-connect to DirectQuery, Power BI Desktop added a local model to the created pbix-file.

As depicted in the next diagram, the local model only contains a reference to the remote model (Sales)- no data is copied, it’s “just” a reference to the remote model.

The local model (Sales with Employees) is our composite model bringing together the Sales information with the Excel employees list.

Power BI composite models architecture

Test the model

To test the new model, I’ve added a new matrix to display

  • the LineTotal (=calculated column in the local model)
  • for every employee (coming from the Excel file to the local model)
  • for every calendar year (attribute out of the remote data model).
Power BI Desktop – test the composite model in a matrix visula

For us as Power BI users, this simple connection / relationship between the two data islands looks like a small feature but I know and heard from the Power BI team that this was not a small, but a huge engineering effort. And the Power BI team DID IT!

That’s it for today! I hope I could give you a small overview about the options that came into Power BI with the DirectQuery / composite Model v2 implementation.

Be aware – it’s in preview and there are still things to mention and not available in this preview version. Have a look at the blog post and the documentation for further information:

Source for the featured image: https://www.pexels.com/photo/selective-focus-photography-cement-2219024/

Posted in Business Applications, Cloud, InformationSharing, PowerBI | 3 Comments

Data News – 2020-12-21

Welcome to another episode of my #DataNews series. It’s almost Christmas and the data world does not really stand still – let’s have a look what happened in the past data-week:

The Future of PASS

PASS – the Professional Association of SQL Server – well, is ceasing all operations due to the financial impact of COVID-19.

After several years of less and less attendees at the global PASS Summit, this year was a complete different story as the PASS Summit went Virtual. The attendance number was not that great and some financial obligations for the upcoming years (conference centers, hotels, …) let the Board of directors no other decision.

Read the full statement here on the PASS website: The Future of PASS

I still look back at a lot of PASS events around the globe – my first PASS Summit 2012 in Seattle started my involvement in the PASS community. In 2015, I came back to PASS Summit – in that year as a speaker. In the following years, I had the chance to speak at numerous events around the globe, meeting great community people, learning new stuff and last but not least – this changed my whole life and career since then. I think PASS and especially the community that formed PASS around the globe made me a different person.

SQLSaturday Vienna 2021 – or should we call it DataEvent 2021 Vienna?

Together with the stop of operation of PASS HQ, all other PASS assets and events will stop to work as we were used to. The PASS Board announced, that all services should work until January 15, 2021. This is great – we can download the recordings of PASS Summit 2021 until that date.

BUT: We – PASS Austria (now Data Community Austria) – are organizing our annually SQLSaturday Vienna on that date. The trademark SQLSaturday belongs to PASS too, and we are “franchising” it for the event.

  • It was planned to work with GoToWebinar licences provided by PASS HQ – which I honestly think will not be the case.
  • The event is completely based on sqlsaturday.com – session submissions, schedule building, attendee registrations, email communication

With these changes with short notice we are face to change everything around our data event. BUT: SQLSaturday Vienna 2021 is still on track!

We are working on PLAN B for our event – we’ll inform you on all our channels (webpage, email, twitter, facebook). I tried to update the SQLSaturday site but it seems it does not like me anymore.

If you are interested in the online event- please, still register on the SQLSaturday Vienna 2021 page – we’ll inform all registration email accounts when we’re done with the new platform!

PASS Austria – The Future

We will continue our operations and engagement to the community. PASS Austria is a local, financially and legally independent user group!

We are also working on a PLAN B for PASS Austria, as our member database is hosed on PASS Systems and we only can send newsletters to our members but cannot transfer member information out of the current platform due to data privacy (GDPR).

Read more about our announcement: Future of PASS Austria | (wordpress.com)

I really wanted to have some more positive data news at the end of the year, but the end of PASS Global is a huge one and it will impact a lot of people, user groups and communities around the world!

Have a nice and successful week,

Wolfgang

PS: I think I will go on a break for #WolfisDataNews over Christmas Season! Expect the news episode in January 2021!

Posted in Azure Synapse Analytics, Cloud, Conference, DataNews, InformationSharing, PowerBI, Presentations, SQLPass | Leave a comment