Azure Cognitive Services

Cognitive Services are a collection of artificial intelligence REST APIs. With Cognitive Services, developers can easily add intelligent features into their applications.


Cognitive Services include:

  • Vision: From faces to feelings, allow apps to understand images and video
  • Speech: Hear and speak to users by filtering noise, identifying speakers, and understanding intent
  • Language: Process text and learn how to recognize what users want
  • Knowledge: Tap into rich knowledge amassed from the web, academia, or your own data
  • Search: Access billions of web pages, images, videos, and news with the power of Bing APIs
  • Decision: Build apps that surface recommendations for informed and efficient decision-making.
  • Azure OpenAI: Apply advanced language models to variety of use cases with the Azure OpenAI service

The collection will continuously improve, adding new APIs and updating existing ones.

Note. In addition to specific cognitive services mentioned below, there is also Cognitive Services general resource that includes multiple cognitive services for simplified administration and development.

Computer Vision Services

Computer Vision Services in Azure:

  • Computer Vision Service allows to analyze images and video, and extract descriptions, tags, objects, and text.
  • Custom Vision Service allows to train custom image classification and object detection models using your own images.
  • Face Service allows to build face detection and facial recognition solutions.
  • Form Recognizer Service allows to extract information from scanned forms and invoices.

Capabilities of Computer Vision services:

  • The ability to analyze an image, evaluate the objects that are detected, and generate a human-readable phrase or sentence that can describe what was detected in the image. It also allows to generate tags for the image.
  • The object detection capability can return what is known as bounding box coordinates. It provides type of the object and a set of coordinates that indicate the top, left, width, and height of the object detected.
  • Feature of detecting brands provides the ability to identify commercial brands.
  • The ability to detect and analyze human faces in an image, including the ability to determine age and a bounding box rectangle for the location of the face(s).
  • The ability to categorize images based on their contents provides such categorization in a form of a parent/child hierarchy.
  • Feature of detecting domain-specific content allows to find thousands of celebrities and famous landmarks.
  • Feature of detecting image types allows to identify clip art images or line drawings.
  • Feature of detecting image color schemes allows to identify the dominant foreground, background, and overall colors in an image.
  • The ability to generate thumbnails allows to create small versions of images.
  • The ability to moderate content allows to detect images that contain adult content or depict violent, gory scenes.

Speech Services

Speech Services in Azure:

  • Speech-to-Text Service allows to transcribe audio into text, either in real-time or asynchronously with batch transcription.
  • Text-to-Speech Service can convert input text into humanlike synthesized speech. Can use prebuilt or custom neural voices, which are humanlike voices powered by deep neural networks.
  • Speaker Recognition provides algorithms that verify and identify speakers by their unique voice characteristics.
  • Speech Translation Service enables real-time, multilingual translation of speech to your applications, tools, and devices. Use this feature for speech-to-speech and speech-to-text translation.
  • Language Identification is used to identify languages spoken in audio when compared against a list of supported languages.
  • Pronunciation Assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio.
  • Intent Recognition uses speech-to-text with conversational language understanding to derive user intents from transcribed speech and act on voice commands.

Capabilities of the Speech Service include but not limited by following:

  • Speech-to-Text Conversion: The Speech Service can convert spoken audio into text, allowing developers to transcribe audio recordings or live speech in real-time. This feature supports multiple languages and can be customized to improve accuracy for specific domains or accents.
  • Text-to-Speech Conversion: The Speech Service can also generate spoken audio from text, allowing developers to create natural-sounding voices for their applications. This feature supports multiple languages and can be customized to add emphasis or emotion to the speech.
  • Speaker Recognition: The Speech Service can identify individual speakers based on their voice characteristics, allowing developers to build applications that can authenticate users based on their voice.
  • Language Translation: The Speech Service can translate spoken audio from one language to another in real-time, allowing developers to build applications that can support multilingual conversations.
  • Keyword Spotting: The Speech Service can detect specific words or phrases in spoken audio, allowing developers to build applications that can trigger actions based on certain keywords.
  • Speech Synthesis Markup Language (SSML): The Speech Service supports SSML, a markup language that allows developers to control aspects of speech synthesis such as pronunciation, intonation, and pauses.
  • Customization: The Speech Service can be customized with specific acoustic and language models to improve accuracy for specific use cases or domains.
  • Integration: The Speech Service can be easily integrated into applications using APIs, SDKs, and pre-built connectors for popular development platforms such as .NET, Java, Python, and Node.js.
  • Deployment flexibility: Azure Cognitive Services Speech features can be deployed in the cloud or on-premises.

Note. Microsoft uses Speech for many scenarios, such as captioning in Teams, dictation in Office 365, and Read Aloud in the Edge browser.

Natural language Processing Services

  • Language Service allows to access features for understanding and analyzing text, training language models that can understand spoken or text-based commands, and building intelligent applications.
  • Translator Service allows to translate text between more than 60 languages.
  • Speech Service allows to recognize and synthesize speech, and to translate spoken languages.
  • Azure Bot Service provides a platform for conversational AI, the capability of a software “agent” to participate in a conversation. Developers can use the Bot Framework to create a bot and manage it with Azure Bot Service – integrating back-end services like Language, and connecting to channels for web chat, email, Microsoft Teams, and others.

Capabilities of Language cognitive service are enabled by pre-trained models that can:

  • Determine the language of a document or text (for example, French or English).
  • Perform sentiment analysis on text to determine a positive or negative sentiment.
  • Extract key phrases from text that might indicate its main talking points.
  • Identify and categorize entities in the text. Entities can be people, places, organizations, or even everyday items such as dates, times, quantities, and so on.

Other Services

  • Anomaly Detector service provides an application programming interface (API) that developers can use to create anomaly detection solutions.
  • Azure Cognitive Search is a private, enterprise, search solution that has tools for building indexes. The indexes can then be used for internal only use, or to enable searchable content on public facing internet assets. Available data will be enriched using pipelines which extract some additional context to be indexed using natural language processing and image processing skills.

Reference Materials

%d bloggers like this: