Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

Yandex SpeechKit technology overview

Written by
Yandex Cloud
Updated at November 20, 2024

Yandex SpeechKit voice technologies are up to any task related to human speech. SpeechKit can recognize speech either in real time or from pre-recorded audio files while automatically detecting the speaker's language. It can also vocalize pattern phrases and long texts with SpeechKit standard voices.

SpeechKit runs using the API interfaces. Depending on the task, you can use the gRPC or REST interfaces. For more information about API implementations in Yandex Cloud, see Yandex Cloud API concepts.

The table provides the most common SpeechKit use cases so that you can choose the appropriate technologies and configure them to meet your needs.

Description Recommended technologies Features and settings
Voice robot
Full or partial automation of telephone communications with customers. For user input: Streaming recognition.
For the system's response: Speech synthesis using standard voices and Brand Voices created specially for you.
  • You can get both intermediate and final recognition results
  • Controlling pronunciation with synthesized text markup
  • Pattern-based speech synthesis
Voice analysis
Operator performance quality control
Transcribing and further analysis of audio recordings of dialogs between customers and call center operators or robots. To recognize pre-recorded audio files: Asynchronous recognition of audio files.
  • Timestamps of the start and end of a word in the recognition results
  • Recognition result normalization
  • Deferred mode for asynchronous recognition of audio files
  • Quotas and limits in SpeechKit
Voice control in apps and smart devices
Voice assistant
The user requests an action or search using voice and the service responds with an action with a voice comment or an image. For user input: Streaming recognition.
For the system's response: Speech synthesis using standard voices and Brand Voices.
  • You can get both intermediate and final recognition results
  • Controlling pronunciation with synthesized text markup
  • Recognition result normalization
Service adaptation to people with visual impairments
Voice control, voice hints and comments for visually impaired users. For user input: Streaming recognition.
For the system's response: Speech synthesis using standard voices and Brand Voices.
  • You can get both intermediate and final recognition results
  • Controlling pronunciation with synthesized text markup
Recognizing audio recordings made during a meeting
Transcribing the audio recordings after the meeting is completed. To recognize pre-recorded audio files: Asynchronous recognition of audio files.
  • Deferred mode for asynchronous recognition of audio files
  • Quotas and limits in SpeechKit
  • Timestamps of the start and end of a word in the recognition results
  • Recognition result normalization
Voicing books and videos
Voicing a book or video with no human speaker involved. Speech synthesis using standard voices and Brand Voices.
  • Controlling pronunciation with synthesized text markup
  • Quotas and limits in SpeechKit
Recording the minutes of a meeting
Transcribing the meeting minutes in real time To recognize the participants' speech: Streaming recognition.
  • You can get both intermediate and final recognition results
  • Recognition result normalization
Video subtitles
Creating subtitles for recorded videos To recognize an audio track: Asynchronous recognition of audio files.
  • Deferred mode for asynchronous recognition of audio files
  • Timestamps of the start and end of a word in the recognition results
  • Recognition result normalization
  • Quotas and limits in SpeechKit
Broadcast subtitles
Transcribing broadcasts in real time. To recognize the broadcast speech: Streaming recognition.
  • You can get both intermediate and final recognition results
  • Recognition result normalization
Transcribing voice messages
Converting short voice messages to text in messengers To recognize audio files: Synchronous recognition. Recognition result settings.

Was the article helpful?

Next
Overview
© 2025 Direct Cursus Technology L.L.C.