Azure Machine Learning Hands-on Labs

Last update: Oct 17, 2017

In this post I will provide information on Azure Machine Learning (ML) Hands-on Labs training for developers, which we will be delivering in New York and other technology centers. After this training you will know how to create Azure Machine Learning experiment, select best ML model, convert the training experiment to a predictive experiment, and create application which will use the model.

The training consists of following labs.

  1. Predict Individual’s Income >50K (Estimated: 1 hour).
  2. Convert a training experiment into a predictive experiment in Azure ML by Mostafa Elzoghbi (Estimated: 30 minutes).
  3. Consume an Azure ML web service using Visual Studio 2015 by Mostafa Elzoghbi (Estimated: 30 minutes).
  4. Flight delay prediction by Todd Kitta. (Estimated: 3 hours) Start from Task 2. This model can be reused later in a separate Cortana Intelligence Suite End-to-End Training.

If you need more detailed instructions for self-placed training, you may also use Hands-on Labs from edX courses (videos with theory and quizzes are included).

  1. DAT203.1x Data Science Essentials
  2. DAT203.2x Principles of Machine Learning
  3. DAT203.3x Applied Machine Learning

Prerequisites

Please install the below software:

  • Activate your Azure account and bring your Microsoft account credentials. Don’t have a Microsoft account? Sign up now.
  • If you do not have Microsoft Azure account, activate a free 30-day trial Microsoft Azure account, or if you subscribe to MSDN, activate your free Azure MSDN subscriber benefits.
  • Preferred OS is Windows 10.
  • Make sure that Visual Studio 2015 Community, Pro, or Enterprise is installed. Make sure that Office 2013 or later is installed. (Optional; alternatively, you may use Windows Data Science virtual machine in Azure).
  • Create Azure ML workspace for free by signing up here.

Additional resources:

  1. Azure Machine Learning (ML)
  2. Cortana Intelligence Suite: Big Data and Advanced Analytics
  3. Big Data Presentation Deck
  4. Azure ML Data Camp Deck
  5. Detailed Azure ML Hands-on-Labs

Next Steps:

  1. Cortana Intelligence Suite End-to-End Training (Using the Flight Delay Prediction model in Azure-based solution).
  2. Data Science with Microsoft R Hands-on Labs (Different ways of using R language).

Webcast: Predictive Data Warehouse with Datameer

In the following webcast, we will talk to Andrew Brust, Senior Director of Market Strategy and Intelligence in Datameer.

We will learn about Hadoop ecosystem and PaaS options in Azure, difference of Data Lake and Data Warehouse, and added value of unstructured datastreams. We will discuss Hadoop learning curve for professionals with OLTP database and BI background, and how Datameer can help to create big data solutions and futureproof against the change.

Technologies: HDInsight, Stream Analytics, Azure Data Lake Store and Analytics, Azure Machine Learning and Power BI.

To access the webcast, you will need to fill small registration form.

Webcast: Data warehouse migration to Azure with Hortonworks

Modern EDW should be able to manage both structured and unstructured data to realize full value of data. Security, consistency, and credibility of data is also very important. Data warehouse and big data solutions from Microsoft provide a trusted infrastructure that can handle all types of data, and scale from terabytes to petabytes, with real-time performance.

In this webcast with participation of Mark Lochbihler (Director of Partner Engineering, Hortonworks) we discuss modern enterprise data warehouses (EDW) and migration to Microsoft Cloud (Azure). We will learn about the process, tools, and reference architectures for data warehouse migration.

To access the webcast, you will need to fill small registration form.

Additional resources:

Empowering Insurance Risk Modeling

In today’s global environment volatile financial markets and natural catastrophes have created a fast-moving risk landscape in both life and nonlife insurance. In addition, many insurers must comply with regulatory regimes to show they can cope with the risks they face.

Using Azure’s virtually limitless capacity and unlimited infrastructure resources, Insurance organizations can run their workloads faster and more frequently compared to on-premises. Use of cloud compute allows to achieve larger peaks at higher frequencies with lower TCO and access the compute power needed for even the most complex models (G-Series boxes). Azure meets a broad set of international and compliance standards for risk modeling solutions in Insurance.

risk-in-ms-cloud

In this MTC Studio recording we discuss Insurance Risk Modeling scenarios with Jonathan Silverman, Director of Business Development for Financial Services, Microsoft. We will discuss Azure and hybrid architectures for risk modeling, case studies, partner solutions and regulatory compliance of Microsoft Azure.

To access the webcast, you will need to fill the registration form.

risk-modeling-recording

Additional materials:

Risk Modeling Partner Applications:

Risk Modeling Case Studies:

Cortana Intelligence Suite: Big Data and Advanced Analytics

In this post we will discuss reference architecture for Big Data and Advanced Analytics using Cortana Intelligence Suite. The architecture can be relevant for organizations looking to fully manage big data and advanced analytics to transform all enterprise information into intelligent action. This will allow to take action ahead of your competitors by going beyond looking in the rearview mirror to predicting what’s next.

In general, in such solutions you use relational and semi-structured data from business and custom applications, and also semi-structured or unstructured data from sensors, devices, web sites, social networks and other sources.

Big Data flow

Big Data flow includes following steps:

  • Ingestions of data, which can be based on bulk mode or event-based/real-time.
  • Processing data to prepare for storage.
  • Storing data in relational or unstructured storage.
  • Processing data for analytics like data aggregation, complex calculations, predictive or statistical modeling etc.
  • Visualizing data and data discovery using BI tools or custom applications.

big-data-flow

Big Data Reference Architecture

Big Data Reference architecture represents most important components and data flows, allowing to do following.

  • Track Azure data (Azure Website generating web logs) and store in ADLS
  • Track real-time data from IOT Suite: collect data from IOT Suite in permanent store (ADLS)
  • Run Machine Learning through R Server for HDInsight to find patterns in data
  • Show results in BI tools (Power BI)

big-data-ra

There are lot of different options to store data, process data and for machine learning. You may use Big Data and Machine Learning decision trees as a first help to choose most relevant components for your solution. (I will also write about information management components like Azure Data Factory, Azure Data Catalog, Sqoop, Pig, Oozie etc. in one of next posts).

Example of Big Data Solution

To show you simple example of Big Data architecture we will use following artificial scenario.

  • AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves and provide added value to their corporate customers.
  • They are looking to pilot a web-app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to if the departing flight will encounter a 15 minute or longer delay, taking into account the weather forecasted for the departure hour.
  • Data platform team prefers to use open source technologies for data processing tasks.
  • Developers will need an easy way to create prediction experiments.

Here is example of architecture allowing to solve the scenario described above. Selected components of Cortana Intelligence Suite are highlighted.

cis-example

Demonstration of described solution is available in MTC Studio webcast: 2016-12-08 | Cortana Intelligence Suite: Big Data and Advanced Analytics.

Additional materials

Data Science with Microsoft R Hands-on Labs

In this post I will provide list of most important publically available Data Science with Microsoft R Hands-on Labs which we use in MTC New York for Microsoft R workshops.

To start doing labs provided below it’s a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning and Open R language. (For this you may use DAT204x Introduction to R for Data Science, DAT209x Programming in R for Data Science and other courses from Microsoft Data Science specialization).

Microsoft R Hands-on Labs

  1. Exploring SQL Server 2016 R Services and Microsoft R Client with R Tools for Visual Studio. (3 hours; manual is available, all necessary tools and files are included; uses New York Taxi dataset; when you see “Times Squire” in the code, change it to “New York” and save)
  2. MTC Microsoft R training by Jarek Kazmierczak. (1-2 hours; contains source file and R scripts)
  3. edX: DAT213x Analyzing Big Data with Microsoft R Server by Seth Mottaghinejad. (16 hours; contains videos, scripts; you may also earn Microsoft certificate; uses New York Taxi dataset; please let me know if you experience any issues with ggplot2 and ggrepel).
  4. Flight delay prediction with Azure ML (90 minutes; exercise 1 from Cortana Intelligence Suite End-to-End Training by Todd Kitta)
  5. Text Mining with R with Azure ML by Seayoung Rhee. (1 hour)
  6. edX. DAT203.1x Data Science Essentials
  7. edX. DAT203.2x Principles of Machine Learning
  8. edX. DAT203.3x Applied Machine Learning
  9. HDInsight Spark MLib (placeholder)
  10. Cognitive Toolkit (CNTK) Deep Dive and Hands-on (tutorial; video).

Here is one of screenshots from the first (highly recommended) training based on New York Taxi dataset.

sqlrserviceslabnyc

Prerequisites to use Data Science Virtual Machine

The Data Science Virtual Machine has all of the tools you will need to work with the materials. You will need Microsoft Azure subscription for this.

  1. To use subscription to Microsoft Azure you can sign up for a free account here or you can use your MSDN subscription.
  2. To create the Data Science Virtual Machine in Azure please login to Azure Portal and create the virtual machine. (New -> Search for “data science” -> select “Data Science Virtual Machine” -> Create).
  3. Optionally you may test your Microsoft R code on top of HDInsight Spark cluster created in Azure Portal.

Prerequisites to use your local machine

If you would like to work with some of the tools locally, please install following components.

  1. Visual Studio – the Community Edition (free) is acceptable – Version 2015 preferable.
  2. Install R Tools for Visual Studio.
  3. Optionally you may use RStudio.
  4. Optionally you may install SQL Server Developer Edition for SQL Server related content.

Additional materials

Materials from Mission Critical Performance Workshop

Today in MTC New York I provided workshop “Always On: Mission Critical Performance” dedicated to some new features of SQL Server 2016. (And this time SQL Server AlwaysOn technology actually was covered, but it was only fraction of the whole content 😉 ).

Here you can find presentation decks from this workshop:

  1. SQL Server 2016 Evolution
  2. SQL Server 2016 Performance (Here I additionally included slides on in-memory OLTP and ColumnStore from SQL Server 2014)
  3. SQL Server 2016 Security and Compliance
  4. SQL Server 2016 Availability
  5. SQL Server 2016 Scalability
  6. SQL Server 2016 Cloud Service (Bonus topic)

Additional materials are available on the official site of SQL Server 2016.

You may also try Virtual Labs. (Please, filter by “SQL Server 2016”).

evolution

 

Cortana Intelligence Suite End-to-End Training

I am very excited to share information about excellent end-to-end hands-on labs training on Cortana Intelligence Suite. This training covers Azure Machine Learning, Azure Data Factory, HDInsight Spark, Power BI, and Intelligent Apps.

cis-ete

The course was developed by MTC Architect Todd Kitta. All training materials are available in his GitHub repository. If you need to provide this training to your team of data platform specialists, please contact Microsoft representative to initiate the training, or write your comment here.

Alternatively, you may register for Cortana Intelligence Suite End to End live event. (December 6, 2016, 9am – 4pm PST)

Course Outline

  • Building a Machine Learning Model and Operationalizing. (This part takes 90 minutes, so if you are not data scientist, feel free to deploy the experiment from the template).
  • Setting Up Azure Data Factory
  • Developing a Data Factory Pipeline for Data Movement
  • Operationalizing Machine Learning Scoring with Azure Machine Learning and Data Factory
  • Summarizing Data Using HDInsight Spark
  • Visualizing Spark Data in Power BI
  • Deploying an Intelligent Web App
  • Wrap-up and Cleanup of Azure Resources

Requirements

  • Microsoft Azure Subscription should be pay-as-you-go, MSDN, or Enterprise Agreement. If you are using your company’s Azure subscription and your company requires that you be connected to your corporate network (through a VPN or otherwise), we recommend that you use a Trial or MSDN subscription for this workshop. This is due to the fact that you will be connecting to your subscription inside of a VM that is not connected to your corporate network.
  • Setup is required before performing the steps in these exercises. Please see the setup instructions before going any further.
  • Please keep in mind that HDInsight cluster and VM you provision as setup for this workshop will incur charges, so provision these resources closest to the workshop date as possible. Preferably the afternoon/night before the workshop.

Materials from Modern Data Warehouse Workshop

Today in MTC New York I provided workshop “Always On: Modern Data Warehouse”. (Don’t confuse with SQL Server AlwaysOn technology 😉 ).

Here you can find presentation decks from this workshop: Modern Data Warehouse Architecture and Data Warehouse Technology Deck. Additional materials are available on the Microsoft Modern Data Warehouse site.

reference-arch

Next data platform workshops in the MTC New York:

  • Nov 30, 2016. Always On: Dashboard in a Day (Power BI)
  • Dec 7, 2016. Always On: Mission Critical Performance

Machine Learning @ 1 million predictions per second and more

Watch recordings of keynote and session previews of  Microsoft Machine Learning & Data Science Summit 2016 on the latest Big Data, Machine Learning, Artificial Intelligence, and Open Source techniques and technologies.

Some take-aways from the keynote:

  1. Combination of in-memory technologies and in-database analytics with R at scale using SQL Server 2016 can make 1 million fraud predictions per second.
  2. U-SQL in combination with Cognitive APIs and Azure ML can significantly extend datasets to make possible to analyze large volumes of images (different objects and complexity) and text (subjects, key phrases, sentiments, story).
  3. In future Azure Data Lake Analytics will support Hive and Spark.
  4. Microsoft ResNet (solutions for Deep Learning) is built using 152 neural network layers.
  5. Azure N-series Virtual Machines with GPUs to be used for Deep Learning are available in preview. For example, Tesla K80 delivers 4992 CUDA cores with a dual GPU design, up to 2.91 Teraflops of double-precision and up to 8.93 Teraflops of single-precision performance.

Case Studies:

  1. Student Drop-Out Prediction Service in Indian schools uses Azure ML.
  2. PROS used Azure and R in SQL Server for airlines to recommend prices in milliseconds. For another customer they moved R-based solution to SQL Server 2016 to generate renewals automatically “faster in a factor of a hundred”.
  3. Dyxia used combination of Microsoft Band, MS Health application, Azure IoT Hub, Stream Analytics, Power BI, Machine Learning and other services to monitor and predict anxiety of children with autism.
  4. eSmart Systems created Connected Drone solution combining drones with Deep Learning in Azure to automate inspections of power lines.
  5. CrowdFlower use crowd sourcing (Human-in-the-Loop) to train machine learning models for non-confident predictions.

Below there are some screenshots from the keynote.

intelligence

in-mem-r-sql

mln-predictions

war-and-peace

deep-learning

List of available recordings: