Tutorials for SpeechKit
- Developing a Telegram bot for text recognition in images, audio synthesis and recognition
- Using Yandex API Gateway to set up speech synthesis in Yandex SpeechKit
Speech recognition
Streaming recognition
-
Audio file streaming recognition using the API v3: This example uses the Russian language, 8,000 Hz LPCM streaming audio from file, single audio channel. Profanity filter enabled in recognition settings.
-
Microphone speech streaming recognition using the API v3: This example uses the Russian language, 8,000 Hz LPCM audio, single audio channel. Profanity filter enabled.
-
Streaming speech recognition with auto language detection in the API v3: This example uses 8,000 Hz LPCM audio, single audio channel.
-
Example use of streaming recognition with API v2: This example uses the Russian language, 8,000 Hz LPCM audio. Profanity and intermediate result filters enabled.
Synchronous recognition
Example of using the API v1 for synchronous recognition: This example uses the Russian language, other parameters are at their defaults.
Asynchronous recognition
-
Asynchronous recognition of LPCM audio files using the API v2: This example uses the Russian language, the
general:rc
language model, 8,000 Hz LPCM audio, single audio channel. -
Asynchronous recognition of OggOpus audio files using the API v2: This example uses the Russian language, other parameters are at their defaults.
-
Asynchronous WAV audio file recognition using the API v3: This example uses the
general
language model and WAV audio format, other parameters are at their defaults. -
Regular asynchronous recognition of audio files from Yandex Object Storage: This example uses the Russian language and the
general
language model. Speech is recognized from audio files of any supported format.
Synthesis
-
Speech synthesis in the API v3: This example uses 22,050 Hz LPCM audio, WAV container, and LUFS loudness normalization.
-
Pattern-based speech synthesis using the API v3: This example uses pattern-based synthesis for the SpeechKit Brand Voice Self Service and SpeechKit Brand Voice Premium voices.
-
Pattern-based speech synthesis in SpeechKit Brand Voice Call Center: This example uses pattern-based synthesis for the SpeechKit Brand Voice Call Center voices.
-
Speech synthesis in WAV format using the API v1: This example uses the Russian language, 48,000 Hz LPCM audio, WAV container, and the
filipp
voice. -
Speech synthesis in OggOpus format using the API v1: This example uses the Russian language and the
filipp
voice. -
Speech synthesis from SSML text using API v1: This example uses the Russian language and the
jane
voice.