Data Science with Microsoft R Hands-on Labs

In this post I will provide list of most important publically available Data Science with Microsoft R Hands-on Labs which we use in MTC New York for Microsoft R workshops.

To start doing labs provided below it’s a good idea to have a general level of predictive and classification Statistics, and a basic understanding of Machine Learning and Open R language. (For this you may use DAT204x Introduction to R for Data Science, DAT209x Programming in R for Data Science and other courses from Microsoft Data Science specialization).

Microsoft R Hands-on Labs

  1. Exploring SQL Server 2016 R Services and Microsoft R Client with R Tools for Visual Studio. (3 hours; manual is available, all necessary tools and files are included; uses New York Taxi dataset; when you see “Times Squire” in the code, change it to “New York” and save)
  2. MTC Microsoft R training by Jarek Kazmierczak. (1-2 hours; contains source file and R scripts)
  3. edX: DAT213x Analyzing Big Data with Microsoft R Server by Seth Mottaghinejad. (16 hours; contains videos, scripts; you may also earn Microsoft certificate; uses New York Taxi dataset; please let me know if you experience any issues with ggplot2 and ggrepel).
  4. Flight delay prediction with Azure ML (90 minutes; exercise 1 from Cortana Intelligence Suite End-to-End Training by Todd Kitta)
  5. Text Mining with R with Azure ML by Seayoung Rhee. (1 hour)
  6. edX. DAT203.1x Data Science Essentials
  7. edX. DAT203.2x Principles of Machine Learning
  8. edX. DAT203.3x Applied Machine Learning
  9. HDInsight Spark MLib (placeholder)
  10. Cognitive Toolkit (CNTK) Deep Dive and Hands-on (tutorial; video).

Here is one of screenshots from the first (highly recommended) training based on New York Taxi dataset.

sqlrserviceslabnyc

Prerequisites to use Data Science Virtual Machine

The Data Science Virtual Machine has all of the tools you will need to work with the materials. You will need Microsoft Azure subscription for this.

  1. To use subscription to Microsoft Azure you can sign up for a free account here or you can use your MSDN subscription.
  2. To create the Data Science Virtual Machine in Azure please login to Azure Portal and create the virtual machine. (New -> Search for “data science” -> select “Data Science Virtual Machine” -> Create).
  3. Optionally you may test your Microsoft R code on top of HDInsight Spark cluster created in Azure Portal.

Prerequisites to use your local machine

If you would like to work with some of the tools locally, please install following components.

  1. Visual Studio – the Community Edition (free) is acceptable – Version 2015 preferable.
  2. Install R Tools for Visual Studio.
  3. Optionally you may use RStudio.
  4. Optionally you may install SQL Server Developer Edition for SQL Server related content.

Additional materials

Materials from Mission Critical Performance Workshop

Today in MTC New York I provided workshop “Always On: Mission Critical Performance” dedicated to some new features of SQL Server 2016. (And this time SQL Server AlwaysOn technology actually was covered, but it was only fraction of the whole content 😉 ).

Here you can find presentation decks from this workshop:

  1. SQL Server 2016 Evolution
  2. SQL Server 2016 Performance (Here I additionally included slides on in-memory OLTP and ColumnStore from SQL Server 2014)
  3. SQL Server 2016 Security and Compliance
  4. SQL Server 2016 Availability
  5. SQL Server 2016 Scalability
  6. SQL Server 2016 Cloud Service (Bonus topic)

Additional materials are available on the official site of SQL Server 2016.

You may also try Virtual Labs. (Please, filter by “SQL Server 2016”).

evolution

 

Cortana Intelligence Suite End-to-End Training

I am very excited to share information about excellent end-to-end hands-on labs training on Cortana Intelligence Suite. This training covers Azure Machine Learning, Azure Data Factory, HDInsight Spark, Power BI, and Intelligent Apps.

cis-ete

The course was developed by MTC Architect Todd Kitta. All training materials are available in his GitHub repository. If you need to provide this training to your team of data platform specialists, please contact Microsoft representative to initiate the training, or write your comment here.

Alternatively, you may register for Cortana Intelligence Suite End to End live event. (December 6, 2016, 9am – 4pm PST)

Course Outline

  • Building a Machine Learning Model and Operationalizing. (This part takes 90 minutes, so if you are not data scientist, feel free to deploy the experiment from the template).
  • Setting Up Azure Data Factory
  • Developing a Data Factory Pipeline for Data Movement
  • Operationalizing Machine Learning Scoring with Azure Machine Learning and Data Factory
  • Summarizing Data Using HDInsight Spark
  • Visualizing Spark Data in Power BI
  • Deploying an Intelligent Web App
  • Wrap-up and Cleanup of Azure Resources

Requirements

  • Microsoft Azure Subscription should be pay-as-you-go, MSDN, or Enterprise Agreement. If you are using your company’s Azure subscription and your company requires that you be connected to your corporate network (through a VPN or otherwise), we recommend that you use a Trial or MSDN subscription for this workshop. This is due to the fact that you will be connecting to your subscription inside of a VM that is not connected to your corporate network.
  • Setup is required before performing the steps in these exercises. Please see the setup instructions before going any further.
  • Please keep in mind that HDInsight cluster and VM you provision as setup for this workshop will incur charges, so provision these resources closest to the workshop date as possible. Preferably the afternoon/night before the workshop.

Materials from Modern Data Warehouse Workshop

Today in MTC New York I provided workshop “Always On: Modern Data Warehouse”. (Don’t confuse with SQL Server AlwaysOn technology 😉 ).

Here you can find presentation decks from this workshop: Modern Data Warehouse Architecture and Data Warehouse Technology Deck. Additional materials are available on the Microsoft Modern Data Warehouse site.

reference-arch

Next data platform workshops in the MTC New York:

  • Nov 30, 2016. Always On: Dashboard in a Day (Power BI)
  • Dec 7, 2016. Always On: Mission Critical Performance

August 2016 Dashboard in a Day Workshop

In this post you can find materials and useful information on Power BI Dashboard in a Day training which will be provided August 23 in Microsoft Technology Center New York.

Please download archive with instructions, source files and Power BI reports.

Prerequisites and setup steps

  • Internet connectivity: You must be connected to the internet
  • At minimum, a computer with 2-cores and 4GB RAM running one of the following version of Windows: Windows 7,  Windows 8, (64-bit preferred), Windows 8.1 or Windows 10 or Windows Server 2008 R2 or Windows Server 2012/R2
  • Microsoft Power BI Desktop requires Internet Explorer 9 or greater
  • Verify if you have 32bit or 64bit operating system to decide if you need to install the 32bit or 64bit applications.
    • Search for computer on your PC, right click properties for your compute
    • You will be able to identify if your operating system is 64 or 32 bit based on “system type” as shown below
  • Download and install Power BI Desktop: Download and install Microsoft Power BI Desktop from http://www.microsoft.com/en-us/download/details.aspx?id=45331.
  • Signup for Power BI: Go to http://aka.ms/pbidiadtraining and sign up for Power BI with a business email address.
  • If you have an existing account, please go to http://app.powerbi.com and Sign in using your Power BI account.

Agenda

Morning

  • 08:30 AM – 09:00 AM – Introduction to Power BI
  • 09:00 AM – 11:00 AM – Power BI Desktop – Content
  • 11:15 AM – 11:30 AM – Power BI Service and overview
  • 11:30 AM – 12:30 PM – Power BI service I – Content
  • 12:30 PM – 01:00 PM – Lunch

Afternoon

  • 01:00 PM – 01:15 PM – Power BI Service II and overview
  • 01:15 PM – 02:15 PM – Power BI Service II – Content
  • 02:15 PM – 04:30 PM – Bring your own data and build dashboards (optional)