Cortana Intelligence Suite: Big Data and Advanced Analytics

In this post we will discuss reference architecture for Big Data and Advanced Analytics using Cortana Intelligence Suite. The architecture can be relevant for organizations looking to fully manage big data and advanced analytics to transform all enterprise information into intelligent action. This will allow to take action ahead of your competitors by going beyond looking in the rearview mirror to predicting what’s next.

In general, in such solutions you use relational and semi-structured data from business and custom applications, and also semi-structured or unstructured data from sensors, devices, web sites, social networks and other sources.

Big Data flow

Big Data flow includes following steps:

  • Ingestions of data, which can be based on bulk mode or event-based/real-time.
  • Processing data to prepare for storage.
  • Storing data in relational or unstructured storage.
  • Processing data for analytics like data aggregation, complex calculations, predictive or statistical modeling etc.
  • Visualizing data and data discovery using BI tools or custom applications.

big-data-flow

Big Data Reference Architecture

Big Data Reference architecture represents most important components and data flows, allowing to do following.

  • Track Azure data (Azure Website generating web logs) and store in ADLS
  • Track real-time data from IOT Suite: collect data from IOT Suite in permanent store (ADLS)
  • Run Machine Learning through R Server for HDInsight to find patterns in data
  • Show results in BI tools (Power BI)

big-data-ra

There are lot of different options to store data, process data and for machine learning. You may use Big Data and Machine Learning decision trees as a first help to choose most relevant components for your solution. (I will also write about information management components like Azure Data Factory, Azure Data Catalog, Sqoop, Pig, Oozie etc. in one of next posts).

Example of Big Data Solution

To show you simple example of Big Data architecture we will use following artificial scenario.

  • AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves and provide added value to their corporate customers.
  • They are looking to pilot a web-app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to if the departing flight will encounter a 15 minute or longer delay, taking into account the weather forecasted for the departure hour.
  • Data platform team prefers to use open source technologies for data processing tasks.
  • Developers will need an easy way to create prediction experiments.

Here is example of architecture allowing to solve the scenario described above. Selected components of Cortana Intelligence Suite are highlighted.

cis-example

Demonstration of described solution is available in MTC Studio webcast: 2016-12-08 | Cortana Intelligence Suite: Big Data and Advanced Analytics.

Additional materials

Cortana Intelligence Suite End-to-End Training

I am very excited to share information about excellent end-to-end hands-on labs training on Cortana Intelligence Suite. This training covers Azure Machine Learning, Azure Data Factory, HDInsight Spark, Power BI, and Intelligent Apps.

cis-ete

The course was developed by MTC Architect Todd Kitta. All training materials are available in his GitHub repository. If you need to provide this training to your team of data platform specialists, please contact Microsoft representative to initiate the training, or write your comment here.

Alternatively, you may register for Cortana Intelligence Suite End to End live event. (December 6, 2016, 9am – 4pm PST)

Course Outline

  • Building a Machine Learning Model and Operationalizing. (This part takes 90 minutes, so if you are not data scientist, feel free to deploy the experiment from the template).
  • Setting Up Azure Data Factory
  • Developing a Data Factory Pipeline for Data Movement
  • Operationalizing Machine Learning Scoring with Azure Machine Learning and Data Factory
  • Summarizing Data Using HDInsight Spark
  • Visualizing Spark Data in Power BI
  • Deploying an Intelligent Web App
  • Wrap-up and Cleanup of Azure Resources

Requirements

  • Microsoft Azure Subscription should be pay-as-you-go, MSDN, or Enterprise Agreement. If you are using your company’s Azure subscription and your company requires that you be connected to your corporate network (through a VPN or otherwise), we recommend that you use a Trial or MSDN subscription for this workshop. This is due to the fact that you will be connecting to your subscription inside of a VM that is not connected to your corporate network.
  • Setup is required before performing the steps in these exercises. Please see the setup instructions before going any further.
  • Please keep in mind that HDInsight cluster and VM you provision as setup for this workshop will incur charges, so provision these resources closest to the workshop date as possible. Preferably the afternoon/night before the workshop.