Azure Search with Knowledge-based Cognitive Capabilities

Azure Search is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich search experience over private, heterogenous content in web, mobile, and enterprise applications.

In a context of organizations, applications with search capabilities can be used by External Customers or Internal Business Users. Example of Cognitive Search on top of publicly available JFK files (see JFK Files Public Site and JFK Files Project).

Full Text Search

When you use Full Text Search, query execution is done over a user-defined index on top files and searchable datasets.

Video: Azure Search Overview by Liam Cavanagh:

Cognitive search

Cognitive search can be added to create searchable information out of non-searchable content by attaching AI algorithms to an indexing pipeline. AI integration is provided through cognitive skills, enriching source documents before creating a search index.

Cognitive Skills are based on the same AI algorithms used in Cognitive Services APIs:

  1. Natural language processing skills include entity recognition, language detection, key phrase extraction, text manipulation, and sentiment detection. With these skills, unstructured text becomes structured, mapped to searchable and filterable fields in an index.
  2. Image processing skills include OCR and identification of visual features, such as facial detection, image interpretation, image recognition (famous people and landmarks) or attributes like colors or image orientation. You can create text-representations of image content, searchable using all the query capabilities of Azure Search.
  3. Custom skills are a way to insert transformations unique to application content. A custom skill executes independently, applying whatever enrichment step you require. For example, you could define field-specific custom entities, build custom classification models to differentiate business and financial contracts and documents, or add a speech recognition skill to reach deeper into audio files for relevant content.

Skills can be chained. For instance, you may want to use the language you detected to improve the accuracy of the key-phrase extractor.

Video from Ignite: AI for Knowledge Mining by Luis Cabrera:

Azure Search Global Distribution

To reduce latency for remote users (in a case of geo-distributed workloads) it makes sense to create search services in each corresponding region (that is in closer proximity to these users). For example, you may use multiple Azure Search indexers in different regions that will point to the same datastore. To route requests to multiple geo-located websites that are then backed by multiple Azure Search Services, you may use Azure Traffic Manager. This approach also provides high availability and load balancing.

List of Azure Search Features

Reference

Modern Data Platform Map and Video

Modern Data Platform Map represents reference organizational layout of most important data pillars and services and corresponding groups of specialists in enterprises.

In the following video I make a quick overview of Microsoft Data Platform. I will provide more details in subsequent posts and videos. Please post your questions, suggestions and feedback below.

You may also check for details following data pillars and products:

Webcast: Enabling student success with cloud computing

In this webcast you will learn how to:

  • Access data analytics tools to enable real-time and predictive analytics
  • Improve student success through measurable results
  • Make the future become less about student grades and more about measuring and customizing education to the needs of the individual student

We will also cover following examples and case studies:

  • Cleveland Metropolitan Case Study
  • Predicting student dropout risks, increasing graduation rates with cloud analytics in Tacoma Public Schools
  • Predicting Student Success using Azure Machine Learning in Northeast Wisconsin Technical College (Proof of Concept)
  • Restart Academy of Missouri (Envisioning Demo by Neal Analytics)
  • Education Data Management showcase (Power BI model by Dell)

To access the webcast, you will need to fill small registration form.

Technologies: Azure Machine Learning and Power BI.

Reference materials:

Webcast: Predictive Data Warehouse with Datameer

In the following webcast, we will talk to Andrew Brust, Senior Director of Market Strategy and Intelligence in Datameer.

We will learn about Hadoop ecosystem and PaaS options in Azure, difference of Data Lake and Data Warehouse, and added value of unstructured datastreams. We will discuss Hadoop learning curve for professionals with OLTP database and BI background, and how Datameer can help to create big data solutions and futureproof against the change.

Technologies: HDInsight, Stream Analytics, Azure Data Lake Store and Analytics, Azure Machine Learning and Power BI.

To access the webcast, you will need to fill small registration form.

Webcast: Data warehouse migration to Azure with Hortonworks

Modern EDW should be able to manage both structured and unstructured data to realize full value of data. Security, consistency, and credibility of data is also very important. Data warehouse and big data solutions from Microsoft provide a trusted infrastructure that can handle all types of data, and scale from terabytes to petabytes, with real-time performance.

In this webcast with participation of Mark Lochbihler (Director of Partner Engineering, Hortonworks) we discuss modern enterprise data warehouses (EDW) and migration to Microsoft Cloud (Azure). We will learn about the process, tools, and reference architectures for data warehouse migration.

To access the webcast, you will need to fill small registration form.

Additional resources:

Cortana Intelligence Suite: Big Data and Advanced Analytics

In this post we will discuss reference architecture for Big Data and Advanced Analytics using Cortana Intelligence Suite. The architecture can be relevant for organizations looking to fully manage big data and advanced analytics to transform all enterprise information into intelligent action. This will allow to take action ahead of your competitors by going beyond looking in the rearview mirror to predicting what’s next.

In general, in such solutions you use relational and semi-structured data from business and custom applications, and also semi-structured or unstructured data from sensors, devices, web sites, social networks and other sources.

Big Data flow

Big Data flow includes following steps:

  • Ingestions of data, which can be based on bulk mode or event-based/real-time.
  • Processing data to prepare for storage.
  • Storing data in relational or unstructured storage.
  • Processing data for analytics like data aggregation, complex calculations, predictive or statistical modeling etc.
  • Visualizing data and data discovery using BI tools or custom applications.

big-data-flow

Big Data Reference Architecture

Big Data Reference architecture represents most important components and data flows, allowing to do following.

  • Track Azure data (Azure Website generating web logs) and store in ADLS
  • Track real-time data from IOT Suite: collect data from IOT Suite in permanent store (ADLS)
  • Run Machine Learning through R Server for HDInsight to find patterns in data
  • Show results in BI tools (Power BI)

big-data-ra

There are lot of different options to store data, process data and for machine learning. You may use Big Data and Machine Learning decision trees as a first help to choose most relevant components for your solution. (I will also write about information management components like Azure Data Factory, Azure Data Catalog, Sqoop, Pig, Oozie etc. in one of next posts).

Example of Big Data Solution

To show you simple example of Big Data architecture we will use following artificial scenario.

  • AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves and provide added value to their corporate customers.
  • They are looking to pilot a web-app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to if the departing flight will encounter a 15 minute or longer delay, taking into account the weather forecasted for the departure hour.
  • Data platform team prefers to use open source technologies for data processing tasks.
  • Developers will need an easy way to create prediction experiments.

Here is example of architecture allowing to solve the scenario described above. Selected components of Cortana Intelligence Suite are highlighted.

cis-example

Demonstration of described solution is available in MTC Studio webcast: 2016-12-08 | Cortana Intelligence Suite: Big Data and Advanced Analytics.

Additional materials

Machine Learning @ 1 million predictions per second and more

Watch recordings of keynote and session previews of  Microsoft Machine Learning & Data Science Summit 2016 on the latest Big Data, Machine Learning, Artificial Intelligence, and Open Source techniques and technologies.

Some take-aways from the keynote:

  1. Combination of in-memory technologies and in-database analytics with R at scale using SQL Server 2016 can make 1 million fraud predictions per second.
  2. U-SQL in combination with Cognitive APIs and Azure ML can significantly extend datasets to make possible to analyze large volumes of images (different objects and complexity) and text (subjects, key phrases, sentiments, story).
  3. In future Azure Data Lake Analytics will support Hive and Spark.
  4. Microsoft ResNet (solutions for Deep Learning) is built using 152 neural network layers.
  5. Azure N-series Virtual Machines with GPUs to be used for Deep Learning are available in preview. For example, Tesla K80 delivers 4992 CUDA cores with a dual GPU design, up to 2.91 Teraflops of double-precision and up to 8.93 Teraflops of single-precision performance.

Case Studies:

  1. Student Drop-Out Prediction Service in Indian schools uses Azure ML.
  2. PROS used Azure and R in SQL Server for airlines to recommend prices in milliseconds. For another customer they moved R-based solution to SQL Server 2016 to generate renewals automatically “faster in a factor of a hundred”.
  3. Dyxia used combination of Microsoft Band, MS Health application, Azure IoT Hub, Stream Analytics, Power BI, Machine Learning and other services to monitor and predict anxiety of children with autism.
  4. eSmart Systems created Connected Drone solution combining drones with Deep Learning in Azure to automate inspections of power lines.
  5. CrowdFlower use crowd sourcing (Human-in-the-Loop) to train machine learning models for non-confident predictions.

Below there are some screenshots from the keynote.

intelligence

in-mem-r-sql

mln-predictions

war-and-peace

deep-learning

List of available recordings: