Yandex SpeechKit technology overview
Written by
Updated at November 20, 2024
Yandex SpeechKit voice technologies are up to any task related to human speech. SpeechKit can recognize speech either in real time or from pre-recorded audio files while automatically detecting the speaker's language. It can also vocalize pattern phrases and long texts with SpeechKit standard voices.
SpeechKit runs using the API interfaces. Depending on the task, you can use the gRPC or REST interfaces. For more information about API implementations in Yandex Cloud, see Yandex Cloud API concepts.
The table provides the most common SpeechKit use cases so that you can choose the appropriate technologies and configure them to meet your needs.
Description | Recommended technologies | Features and settings |
---|---|---|
Voice robot | ||
Full or partial automation of telephone communications with customers. | For user input: Streaming recognition. For the system's response: Speech synthesis using standard voices and Brand Voices created specially for you. |
|
Voice analysisOperator performance quality control | ||
Transcribing and further analysis of audio recordings of dialogs between customers and call center operators or robots. | To recognize pre-recorded audio files: Asynchronous recognition of audio files. |
|
Voice control in apps and smart devices Voice assistant | ||
The user requests an action or search using voice and the service responds with an action with a voice comment or an image. | For user input: Streaming recognition. For the system's response: Speech synthesis using standard voices and Brand Voices. |
|
Service adaptation to people with visual impairments | ||
Voice control, voice hints and comments for visually impaired users. | For user input: Streaming recognition. For the system's response: Speech synthesis using standard voices and Brand Voices. |
|
Recognizing audio recordings made during a meeting | ||
Transcribing the audio recordings after the meeting is completed. | To recognize pre-recorded audio files: Asynchronous recognition of audio files. |
|
Voicing books and videos | ||
Voicing a book or video with no human speaker involved. | Speech synthesis using standard voices and Brand Voices. |
|
Recording the minutes of a meeting | ||
Transcribing the meeting minutes in real time | To recognize the participants' speech: Streaming recognition. |
|
Video subtitles | ||
Creating subtitles for recorded videos | To recognize an audio track: Asynchronous recognition of audio files. |
|
Broadcast subtitles | ||
Transcribing broadcasts in real time. | To recognize the broadcast speech: Streaming recognition. |
|
Transcribing voice messages | ||
Converting short voice messages to text in messengers | To recognize audio files: Synchronous recognition. | Recognition result settings. |