Modern Data Platform Map and Video

Last update: Dec 4, 2018

Modern Data Platform Map represents reference organizational layout of most important data pillars and services and corresponding groups of specialists in enterprises.

In the following video I make a quick overview of Microsoft Data Platform. I will provide more details in subsequent posts and videos. Please post your questions, suggestions and feedback below.

You may also check for details following data pillars and products:

Business Intelligence Solutions Decision Tree

In this article we will cover most important Business Intelligence components based on Microsoft Data Platform. One week ago there were announcements on Power BI Premium and Power BI Report Server which will require some clarification, so I decided to create another decision tree describing available Microsoft analytical modeling and visualization tools, and covering Power BI related components in more detailed way.

For the purposes of this article we will define Business Intelligence in a narrow way, as top and middle layers of BI stack, so it will include Analytical Modeling, Data Visualization, and Collaboration. We will also cover Sites and Apps integration as important part of BI functionality.

  1. Analytical Modeling solutions allow to load data from different data sources, combine data in one model and create calculations.
  2. Data Visualization and Collaboration solutions allow users to create, change, manage and share reports and dashboards built on top of analytical models or data sources.
  3. Sites and Apps Integration solutions allow to create applications of top of data sources, embed analytical resorts into applications and web sites, and create data driven workflows.

Here is the decision tree, which maps these areas to specific solutions. Below I will provide some comments on each of them.

Analytical Modeling

  • Azure Analysis Services is Azure PaaS offering built on the proven analytics engine in Microsoft SQL Server Analysis Services. Azure Analysis Services provides enterprise-grade tabular data modeling in the cloud.
  • SQL Server Analysis Services (SSAS) is a part of SQL Server which contains engines for multidimensional (OLAP) and tabular analytical models, and for data mining.
  • SQLBI DAX Studio is a tool to write, execute, and analyze DAX queries in Power BI Designer, Power Pivot for Excel, and Analysis Services Tabular. It includes an Object Browser, query editing and execution, formula and measure editing, syntax highlighting and formatting, integrated tracing and query execution breakdowns.
  • Microsoft Excel is a spreadsheet application with cell-based calculations. It includes Pivot Tables, Pivot Charts and Power View for data visualization; Power Query for data transformation; Power Pivot to create in-memory tabular models and calculations. Excel is a component of Microsoft Office applications package, and is also available in Office 365 subscriptions.
  • Power BI Desktop is a visual data exploration tool for data analysis and reports creation. It allows to load multiple data sources, establish data structure, transform, create analytical tabular model, visualize and explore data in interactive way, and also publish to Power BI Service.

Visualization and collaboration

  • Power BI is a set of tools for self-service and traditional business intelligence, which uses tabular analytical models, allows to build interactive reports and dashboards, and features mobile reports, collaboration and application embedding.
  • Power BI Mobile is a set of free Windows, iOS, and Android applications allowing to view and explore personalized dashboards and reports created in Power BI Service. Also it allows users to be up-to-date with data-driven alerts.
  • Power BI Service (or powerbi.com), is a SaaS part of Power BI offering allowing to create interactive reports, build dashboards, create reports & datasets, update data with real-time, automatic and scheduled refreshes, share dashboards easily with other people in your organization, ask questions of data with Natural Language Query, stay connected to data all the time with mobile applications.
    • Power BI Free is a free version of Power BI Service intended for report authoring (personal use). Currently this service is in transition to have the same functionality as Power BI Pro, but with limited sharing and collaboration features. (This will be effective June 1st)
    • Power BI Pro is a professional version of Power BI Services intended for report authoring, sharing and collaboration. Power BI Pro is payed per user, per month.
    • Power BI Premium is dedicated capacity for large-scale BI deployments, with enhanced performance and larger data volumes, without requiring to purchase per-user licenses. Power BI Premium builds on the existing Power BI portfolio with a capacity-based licensing model that increases flexibility for how users access, share and distribute content. Power BI Premium is payed per node, per month.
  • Power BI Report Server is an on-premises server that allows the deployment and distribution of interactive Power BI reports – and traditional paginated reports – completely within the boundaries of the organization’s firewall. Power BI Report Server is available as part of Power BI Premium or with SQL EE SA.
  • SQL Server Reporting Services is a solution for creating, publishing, managing reports, and delivering reports to users in web browser, on mobile device, or as an email. Types of supported reports: “traditional” paginated reports, mobile reports (AKA DataZen), and Power BI reports (through Power BI Server of Power BI Service).

Sites and Apps development

  • Power BI Embedded is PaaS offering in Azure, which provides interactive data visualizations in customer-facing apps without the time and expense of having to build it from the ground up. In future it will be converged with the Power BI Service to deliver one API surface, a consistent set of capabilities and access to the latest features.
  • Microsoft Flow is a component of Office 365 which represents user friendly and intuitive way of creating automated workflows between applications and services to generate notifications, synchronize files, collect data, and produce other actions.
  • Microsoft PowerApps is a component of Office 365 with user friendly and intuitive interface allowing to build applications without writing code, connect to data sources and create new data, publish and use created apps on web and mobile devices. Power Apps allows business experts in the organization to create the apps they need to support their business requirements with drag and drop simplicity.
  • SharePoint in Office 365 allows you to integrate Power BI interactive reports into SharePoint web pages.
  • SharePoint Server on-premises solution also includes BI-related functionality of SharePoint Server. It includes integration with SQL Server Reporting Services, and also creating PerformancePoint dashboards.

Reference materials:

Webcast: Enabling student success with cloud computing

In this webcast you will learn how to:

  • Access data analytics tools to enable real-time and predictive analytics
  • Improve student success through measurable results
  • Make the future become less about student grades and more about measuring and customizing education to the needs of the individual student

We will also cover following examples and case studies:

  • Cleveland Metropolitan Case Study
  • Predicting student dropout risks, increasing graduation rates with cloud analytics in Tacoma Public Schools
  • Predicting Student Success using Azure Machine Learning in Northeast Wisconsin Technical College (Proof of Concept)
  • Restart Academy of Missouri (Envisioning Demo by Neal Analytics)
  • Education Data Management showcase (Power BI model by Dell)

To access the webcast, you will need to fill small registration form.

Technologies: Azure Machine Learning and Power BI.

Reference materials:

Cortana Intelligence Suite: Big Data and Advanced Analytics

In this post we will discuss reference architecture for Big Data and Advanced Analytics using Cortana Intelligence Suite. The architecture can be relevant for organizations looking to fully manage big data and advanced analytics to transform all enterprise information into intelligent action. This will allow to take action ahead of your competitors by going beyond looking in the rearview mirror to predicting what’s next.

In general, in such solutions you use relational and semi-structured data from business and custom applications, and also semi-structured or unstructured data from sensors, devices, web sites, social networks and other sources.

Big Data flow

Big Data flow includes following steps:

  • Ingestions of data, which can be based on bulk mode or event-based/real-time.
  • Processing data to prepare for storage.
  • Storing data in relational or unstructured storage.
  • Processing data for analytics like data aggregation, complex calculations, predictive or statistical modeling etc.
  • Visualizing data and data discovery using BI tools or custom applications.

big-data-flow

Big Data Reference Architecture

Big Data Reference architecture represents most important components and data flows, allowing to do following.

  • Track Azure data (Azure Website generating web logs) and store in ADLS
  • Track real-time data from IOT Suite: collect data from IOT Suite in permanent store (ADLS)
  • Run Machine Learning through R Server for HDInsight to find patterns in data
  • Show results in BI tools (Power BI)

big-data-ra

There are lot of different options to store data, process data and for machine learning. You may use Big Data and Machine Learning decision trees as a first help to choose most relevant components for your solution. (I will also write about information management components like Azure Data Factory, Azure Data Catalog, Sqoop, Pig, Oozie etc. in one of next posts).

Example of Big Data Solution

To show you simple example of Big Data architecture we will use following artificial scenario.

  • AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves and provide added value to their corporate customers.
  • They are looking to pilot a web-app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to if the departing flight will encounter a 15 minute or longer delay, taking into account the weather forecasted for the departure hour.
  • Data platform team prefers to use open source technologies for data processing tasks.
  • Developers will need an easy way to create prediction experiments.

Here is example of architecture allowing to solve the scenario described above. Selected components of Cortana Intelligence Suite are highlighted.

cis-example

Demonstration of described solution is available in MTC Studio webcast: 2016-12-08 | Cortana Intelligence Suite: Big Data and Advanced Analytics.

Additional materials

Cortana Intelligence Suite End-to-End Training

I am very excited to share information about excellent end-to-end hands-on labs training on Cortana Intelligence Suite. This training covers Azure Machine Learning, Azure Data Factory, HDInsight Spark, Power BI, and Intelligent Apps.

cis-ete

The course was developed by MTC Architect Todd Kitta. All training materials are available in his GitHub repository. If you need to provide this training to your team of data platform specialists, please contact Microsoft representative to initiate the training, or write your comment here.

Alternatively, you may register for Cortana Intelligence Suite End to End live event. (December 6, 2016, 9am – 4pm PST)

Course Outline

  • Building a Machine Learning Model and Operationalizing. (This part takes 90 minutes, so if you are not data scientist, feel free to deploy the experiment from the template).
  • Setting Up Azure Data Factory
  • Developing a Data Factory Pipeline for Data Movement
  • Operationalizing Machine Learning Scoring with Azure Machine Learning and Data Factory
  • Summarizing Data Using HDInsight Spark
  • Visualizing Spark Data in Power BI
  • Deploying an Intelligent Web App
  • Wrap-up and Cleanup of Azure Resources

Requirements

  • Microsoft Azure Subscription should be pay-as-you-go, MSDN, or Enterprise Agreement. If you are using your company’s Azure subscription and your company requires that you be connected to your corporate network (through a VPN or otherwise), we recommend that you use a Trial or MSDN subscription for this workshop. This is due to the fact that you will be connecting to your subscription inside of a VM that is not connected to your corporate network.
  • Setup is required before performing the steps in these exercises. Please see the setup instructions before going any further.
  • Please keep in mind that HDInsight cluster and VM you provision as setup for this workshop will incur charges, so provision these resources closest to the workshop date as possible. Preferably the afternoon/night before the workshop.

August 2016 Dashboard in a Day Workshop

In this post you can find materials and useful information on Power BI Dashboard in a Day training which will be provided August 23 in Microsoft Technology Center New York.

Please download archive with instructions, source files and Power BI reports.

Prerequisites and setup steps

  • Internet connectivity: You must be connected to the internet
  • At minimum, a computer with 2-cores and 4GB RAM running one of the following version of Windows: Windows 7,  Windows 8, (64-bit preferred), Windows 8.1 or Windows 10 or Windows Server 2008 R2 or Windows Server 2012/R2
  • Microsoft Power BI Desktop requires Internet Explorer 9 or greater
  • Verify if you have 32bit or 64bit operating system to decide if you need to install the 32bit or 64bit applications.
    • Search for computer on your PC, right click properties for your compute
    • You will be able to identify if your operating system is 64 or 32 bit based on “system type” as shown below
  • Download and install Power BI Desktop: Download and install Microsoft Power BI Desktop from http://www.microsoft.com/en-us/download/details.aspx?id=45331.
  • Signup for Power BI: Go to http://aka.ms/pbidiadtraining and sign up for Power BI with a business email address.
  • If you have an existing account, please go to http://app.powerbi.com and Sign in using your Power BI account.

Agenda

Morning

  • 08:30 AM – 09:00 AM – Introduction to Power BI
  • 09:00 AM – 11:00 AM – Power BI Desktop – Content
  • 11:15 AM – 11:30 AM – Power BI Service and overview
  • 11:30 AM – 12:30 PM – Power BI service I – Content
  • 12:30 PM – 01:00 PM – Lunch

Afternoon

  • 01:00 PM – 01:15 PM – Power BI Service II and overview
  • 01:15 PM – 02:15 PM – Power BI Service II – Content
  • 02:15 PM – 04:30 PM – Bring your own data and build dashboards (optional)

Vehicle Health & Driving Pattern Analysis using Cortana Analytics with Power BI

Last changes: January 13, 2016

In the following scenario of advanced analytics we will show how car dealers, insurances and automobile manufacturers can use Cortana Analytics including Power BI to gain real-time and predictive insights on vehicle health and driving pattern behavior.

The solution can be applied to following business use cases:

  • Usage-based insurance
  • Vehicle diagnostic
  • Engine emission control
  • Engine performance remapping
  • Eco-driving
  • Roadside assistance calls
  • Fleet management

auto-scenarios-by-microsoft

Starting December 1, 2015 the solution called Vehicle Telemetry Analytics template is available at Cortana Analytics Gallery. Here is quick promotional video:

In the following video and text below you will see some details on solution architecture which includes following technologies: Event Hub, Azure Stream Analytics, Azure Machine Learning, Azure Data Factory, HDInsight, Azure Storage, Azure SQL DW, and Power BI.

Let’s look on data flow and solution components.

cc-arch

The Event Hub is used to ingest huge amount of events from the vehicles into Azure for real-time and batch analytics.

The Stream Analytics job is performing real-time data ingestion into the long term storage for batch analytics and data preparation for real-time predictive insights.

Below you can see description of three queries processed in the Stream Analytics for following purposes. (All three queries are enriched with detailed data on each vehicle from Blob Storage).

Query #1 performs join with reference data from Azure Blob Storage and accumulates the resultant data into a different container in the Blob Storage for rich batch analytics.

Query #2 publishes the data as-is to the output Event Hub so that it can be consumed by the RealtimeDashboard app that invokes machine learning request/response end-point for real-time anomaly detection and pushes the results to the PowerBI live dashboard.

Query #3 performs aggregations on the data within a 3 sec tumbling window and publishes it to an Azure SQL instance that got provisioned as part of the deployment.

Data Factory is used for

  • Orchestration, monitoring and management of the batch analytics pipeline
  • Transformation of the data in an on-demand HDInisght cluster for rich insights on Driving Behavior Pattern and Vehicle Health Trending
  • Data movement across the various data stores

cc-datafactory

All data in source datasets are processed using Hive queries where we describe data structures based on CSV files. Additionally we define new tables and calculate aggregations using INSERT request.

cc-hive

In this solution, we are targeting the following batch insights:

  • Aggressive driving behavior (Identifies the trend of the models, locations, driving conditions, and time of the year to gain insights on aggressive driving pattern allowing Contoso Motors to use it for marketing campaigns, driving new personalized features and usage based insurance.)
  • Fuel efficient driving behavior (Identifies the trend of the models, locations, driving conditions, and time of the year to gain insights on fuel efficient driving pattern allowing Contoso Motors to use it for marketing campaigns, driving new features and proactive reporting to the drivers for cost effective and environment friendly driving habits.)
  • Recall models (Identifies models requiring recalls by anomaly detection trend and correlation with driving habits)

An anomaly detection Azure Machine Learning model is used in this demo to detect safety issues for vehicle recall and identifying vehicles requiring maintenance. This model is published in an existing subscription and the web service endpoint is leveraged both in request/response and batch mode for operationalization in the real-time and the batch processing.

CC-demo-AML

Aggregated data from Blob Storage is moved to Azure Data Warehouse for historical storage.

Power BI dashboards contain historical data from Azure DW and real-time data from the Azure Stream Analytics and the Event Hub.

CC-demo-powerbi

Special thanks to authors of the demo scenario: Anand Subbaraj, Sanjay Soni, Christoph Schuler, Santosh Waghmare, Shashank Khedikar, and Sam Istephan.