Azure Synapse Analytics end-to-end: From your database and sensors to a Power BI dashboard

In todays blog post I would like to build an end-to-end solution to combine data coming from different sources and stored in different form factors into a single Power BI data model using Azure Synapse Analytics.

The Basics to start with

For those of you, that do not know Azure Synapse Analytics, just a short explanation – taken from the official product documentation page: Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale.

So what does this mean? Well, in theory it should not really matter where your data comes from, in which format it arrives and how you like to transform and query it – Azure Synapse Analytics should solve this for you.. 🙂 In theory… It’s a little bit of work from your side to solve it.. but let’s have a look how we can deal with it.

Over time, two major streams for data in enterprises were created: 1) traditional data warehousing for dealing with well-structured data and reporting requirements and 2) big data analysis to deal with (huge) amounts of less structured data.

Use case 1) involves business users from the enterprises to work with the data warehousing data based on proven security, dependable performance and data stored in a relational structure. Data in use case 2) is often used by data scientists for exploration, big data analysis and machine learning based on semi structured data in a data lake.

In the past, different (Azure) services were needed to solve and deal with these different kind of analytic requirements, storage technologies and performance requirements.

With the announcement of Azure Synapse Analytics, the separate services are now combined under a single bracket – a so-called Azure Synapse Workspace.

source: Analytics in a Day slides

Within an Azure Synapse Analytics Workspace,…

  • ..you can integrate data using Synapse Pipelines (if you already are familiar with Azure Data Factory, it’s an easy go for you)
  • .. the central data storage is a based on an Azure Data Lake Storage Gen2 (needed for metadata storage)
  • .. you can use different kind of analytic runtimes to work with your data
    • SQL-based runtimes using either a serverless approach or a provisioned flavor (named dedicated SQL Pools and based on the Azure SQL Datawarehouse technology).
    • Apache Spark runtimes to use the distributed architecture, notebooks and language of your choice
  • .. security is integrated from ground up using Azure Active Directory
  • .. Synapse Studio is the development single-stop user interface to create artifacts and work with all objects in the workspace
  • .. you can configure multiple linked services and linked data storages to extend the data landscape
  • .. you can configure connected services (like Power BI or Azure Machine Learning) which – in the example case of Power BI – allow a direct creation of Power BI reports within Synapse Studio

And now: Action for an end-to-end “Data through Azure Synapse to Power BI” demo

In my example scenario I would like to analyze sensor-data originating from different machines in different location of our company. The sensor values are gathered directly at the machines and sent through Azure Stream Analytics into a directory in our data lake (this process is not covered in this example).

I am not going into detail how to create a new Synapse workspace and how to create new dedicated SQL Pools or managing the connections to linked services etc.

Data Exploration

In order to get a first feeling of our data, we can use Azure Synapse Studio and there the Data hub to start with our data lake exploration. The output of our data collection process stores the data into CSV files – one for every machine. An example input directory would look like this

Sensor data in our data lake

Synapse Studio allows us to preview the content of CSV files – use the context menu and the Preview action. For a first look, Synapse Studio now opens a dialog with a preview of the file content

Data file preview in Synapse Studio

But we want to work with the data and query the file content. To achieve this, we can use the context menu again and select New SQL Script -> SELECT TOP 100 ROWS. This action opens a new SQL script window containing a script to query the CSV file from our data lake storage.

The magic in this script is based around the keyword OPENROWSET – this function reads the CSV content and returns the data in a relational structure. In the first screenshot the relational structure is not that kind of visible – only one column C1 is returned by the statement. What needs to be done are some additional parameters and tweeks to get a more reliable structure.

Also, this script is executed against / using the SQL Serverless pool (named Built-in). The source for this query is one single file – SensorValues_M_01.csv in the data lake.

An enhanced version of the SQL script deals with some of the points mentioned above:

SELECT
    *
FROM
    OPENROWSET(
        BULK 'https://<ourdatalake>.dfs.core.windows.net/data/staging/sensors/SensorValues_M*.csv',
        FORMAT = 'CSV',
        PARSER_VERSION='2.0'
        , FIELDTERMINATOR = ';'
    ) 
    WITH (
        MeasureDT       DATETIME2(0),
        MachineName     VARCHAR(20),
        SensorKey       VARCHAR(20),
        SensorValue     DECIMAL(18,2)
    )
    AS [result]

In this version of the SQL script I’ve changed the name of the source file to use a wildcard – i.e. to read all the files in the /staging/sensors/ directory following the specified file name pattern.

Second, the option FIELDTERMINATOR =’;’ was added to OPENROWSET -> this returns the four columns from the CSV file.

Third, by using the WITH clause I declared a relational structure for the statement output – defining column names and datatypes.

Data Import – Sensor Measurements

SQL Serverless pools in Azure Synapse do not store data by themselves, they are metadata containers for external table and view definitions.

What I would like to do for the next step of our example – store the sensor values in a relational store we need to switch to Dedicated SQL pools. For the import we have at least two choices – the first: use Synapse Pipelines to load the data from the data lake.

Second, use the bulk copy feature that Synapse Studio brings out-of-the box. Back in the Data Hub, I select a data file present and in the context menu the action New SQL script -> Bulk load is the door opener to the data integration step. After the wizard-like steps are finished, a COPY SQL Script is generated.

Select the Bulk load option to generate a COPY statement

First, select the input files / folder – i.e. the content you want to import. I was not able to select an input folder – Synapse did not like this selection and gave me errors. So I decided to go for a single file (the _M_02.csv file)

Select the source files/folder

Next, configure the input settings – in my case the field terminator ‘;’

configure the input file format

The sink of our COPY statement – a dedicated SQL pool and a new destination table – staging.SensorValues.

and specify the target information (database and table)

Just to be sure, have a look at the column mapping.

and have a look at the column mapping

I’ve changed the generated SQL script to provide some more meaningful column names and to read in not only one file but the content of the folder (using a wildcard).

the SQL script to COPY data into the dedicated SQL pool

Our current dataset looks as follows:

initial sensor data in our SQL Pool

Data Import – Masterdata

In addition to our sensor measurements, I want to include some masterdata information into our data warehouse database. The masterdata list for our machines and detailed information for our sensor-measurement keys are stored in an Azure SQL database.

Masterdata in an Azure SQL database

For the data import, a Synapse Pipeline using two copy data activities was created. For each of the sources, an integration dataset was created pointing to the according database table. The sinks for the copy activities are both pointing to the SQL dedicated pool.

To test the data import, use the Add trigger -> Trigger Now option in the menu.

Trigger a pipeline run

And the Monitor hub to monitor the pipeline execution run.

Monitoring of the pipeline run

Data Modelling (in the SQL Pool)

Up next – combine the data islands (sensor measurements and masterdata) into a logical layer to serve as input for the Power BI data model.

Using Synapse Studio and the Develop hub, I’ve created Views to create the logical layer.

CREATE VIEW [reporting].[SensorValues]
AS SELECT 
    [MeasureDT]
    ,[MachineID]
    ,[SensorID]
    ,[SensorValue]
    , MachineID + '_' + SensorID    MachineID_SensorID
 FROM [staging].[SensorValues];
GO

CREATE VIEW [reporting].[Sensors]
AS SELECT 
    [MachineID]
    ,[SensorID]
    ,[SensorName]
    , MachineID + '_' + SensorID    MachineID_SensorID
 FROM [masterdata].[Sensors];
GO

CREATE VIEW [reporting].[Machines]
AS SELECT 
    [MachineID]
    ,[MachineName]
    ,[Location]
 FROM [masterdata].[Machine];
GO

And the result looks like:

Create a Power BI Data Model

As the next step, a Power BI data model is needed for our visualizations. As I mentioned in the beginning, with the use of connected services the boundaries of a Synapse Workspace can be extended and external, Azure / Office 365 services can be integrated in Synapse Studio.

In order to do so, switch to the Manage hub and add a new linked service (type Power BI). Select an existing Power BI workspace, name it and that’s it!

New Linked Service of type Power BI

As of today, only one Power BI workspace can be linked to a Synapse workspace. I hope this limitation will be lifted in the future..

Only one linked Power BI workspace is allowed as of today

At the Develop hub, the Power BI workspace appears in the Power BI section: datasets and reports already existing are listed here. It does not matter if these items were created in Power BI directly or in Synapse Studio.

Power BI workspace integration in Synapse Studio

Open the datasets list and add a new one.

Download the pbids file to open it in Power BI Desktop

Open the downloaded PBIDS (Power BI Datasource file) in Power BI Desktop. The connection to the dedicated SQL Pool is automatically established. Select the three views we created in one of the previous steps and load them into the Power BI model.

Select the views and load them into the Data Model

For this demo, we will use Import mode. If you want to see a demo using DirectQuery a Data Lake directly, please have a look at my blog post here.

Use Import mode

Update the relationships between the tables and the data model looks like this:

The Data Model

I’ve created one sample report page:

Demo Power BI report

Save the PBIX file and publish it to the linked Power BI workspace.

Publish process

After the publish is finished, switch back to Synapse Studio and use the Close and refresh button.

There are two options now – use the published dataset to create a new report or open the report directly from Synapse Studio:

Power BI integration in Synapse Studio

The Power BI editing experience is fully integrated into Synapse Studio – change the report as you like. Be aware that all the changes you perform on one of the sides (Power BI service or Power BI integration in Synapse Studio) will appear on the other side too 😉

With that, we are at the end of our end-to-end Synapse example. To recap,…

  • we’ve explored data files in the data lake directly in Synapse Studio using the preview option and the SQL serverless OPENROWSET command
  • we’ve used the bulk load wizard to generate a COPY statement to load sensor data into a dedicated SQL Pool
  • we’ve used Synapse pipelines to load masterdata from an Azure SQL databae into a Synapse dedicated SQL pool
  • we’ve created views in the dedicated SQL pool as a logical layer
  • we’ve created a Power BI data model based on the views
  • we’ve used the Synapse Power BI integration to open and edit the report

About wolfgang

Data Platform enthusiast
This entry was posted in Azure, Azure Synapse Analytics, Business Intelligence, Data Lake, InformationSharing, PowerBI. Bookmark the permalink.

1 Response to Azure Synapse Analytics end-to-end: From your database and sensors to a Power BI dashboard

  1. Pingback: From Azure Synapse Analytics to Power BI – Curated SQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s