API v2 for streaming recognition

Written by

Updated at February 10, 2025

The streaming recognition service is located at: stt.api.cloud.yandex.net:443

Message with recognition settings

Parameter	Description
config	object Field with the recognition settings and folder ID.
config .specification	object Recognition settings.
config .specification .languageCode	string Recognition language. See the model description for acceptable values. The default value is `ru-RU`, Russian.
config .specification .model	string Language model to use for recognition. The more accurate your choice of the model, the better the recognition result. You can specify only one model per request. The acceptable values depend on the language you select. The default value is `general`.
config .specification .profanityFilter	boolean Profanity filter. Acceptable values: `true`: Exclude profanities from the recognition results. `false` (default): Do not exclude profanities from the recognition results.
config .specification .partialResults	boolean Intermediate result filter. Acceptable values: `true`: Return intermediate results (part of recognized utterance). For intermediate results, `final` equals `false`. `false` (default): Return only the final results (entire recognized utterance).
config .specification .singleUtterance	boolean Flag disabling recognition after the first utterance. Acceptable values: `true`: Recognize only the first utterance, stop recognition and wait for the user to disconnect. `false` (default): Continue recognition until the end of the session.
config .specification .audioEncoding	string Submitted audio format. Acceptable values: `LINEAR16_PCM`: LPCM without a WAV header. `OGG_OPUS` (default): OggOpus format.
config .specification .sampleRateHertz	integer (int64) Submitted audio sampling frequency. This parameter is required if `format` equals `LINEAR16_PCM`. Acceptable values: `48000` (default): 48 kHz. `16000`: 16 kHz. `8000`: 8 kHz.
config. specification. rawResults	boolean Flag for how to write numbers: `true` for words, `false` (default) for figures.
folderId	string ID of the folder you have access to. It is required for authentication with a user account (see Authentication with the SpeechKit API). Do not use this field if you make a request on behalf of a service account. The maximum string length is 50 characters.

Experimental additional recognition settings

For streaming recognition models, new recognition settings are supported. They are passed to a gRPC procedure via metadata.

Parameter	Description
`x-normalize-partials`	boolean Flag allowing you to get intermediate recognition results (parts of recognized utterance) in a normalized format: numbers as digits, profanity filter enabled, etc. Acceptable values: `true`: Return a normalized result. `false` (default): Return an non-normalized result.

Audio message

Parameter	Description
`audio_content`	Audio fragment represented as an array of bytes. The audio must match the format specified in the message with recognition settings.

Message with recognition results

If speech fragment recognition is successful, you will receive a message containing a list of recognition results (chunks[]). Each result contains the following fields:

alternatives[]: List of recognized text alternatives. Each alternative contains the following fields:
- text: Recognized text.
- confidence: This field is currently not supported. Do not use it.
final: Flag indicating that this recognition result is final and will not change anymore. If the value is false, it means the recognition result is intermediate and may change as subsequent speech fragments get recognized.
endOfUtterance: Flag indicating that this result contains the end of the utterance. If the value is true, the new utterance will start with the next result you get.

Note

If you set singleUtterance=true, only one utterance per session will be recognized. After the message where endOfUtterance is true, the server will not recognize the following utterances and will wait for you to terminate the session.

Error codes returned by the server

To see how gRPC statuses correspond to HTTP codes, see google.rpc.Code.

List of possible gRPC errors returned by the service:

Code	Status	Description
3	`INVALID_ARGUMENT`	Incorrect request parameters specified. Detailed information is provided in the `details` field.
9	`RESOURCE_EXHAUSTED`	Client exceeded a quota.
16	`UNAUTHENTICATED`	The operation requires authentication. Check the IAM token and the folder ID that you provided.
13	`INTERNAL`	Internal server error. This error means that the operation cannot be performed due to a server-side technical problem, e.g., due to insufficient computing resources.

Use cases

Example use of streaming recognition with API v2.

API v2 for streaming recognition

Message with recognition settingsMessage with recognition settings

Experimental additional recognition settingsExperimental additional recognition settings

Audio messageAudio message

Message with recognition resultsMessage with recognition results

Error codes returned by the serverError codes returned by the server

Use casesUse cases