API v2 for streaming recognition
The streaming recognition service is located at: stt.api.cloud.yandex.net:443
Message with recognition settings
Parameter | Description |
---|---|
config | object Field with the recognition settings and folder ID. |
config .specification |
object Recognition settings. |
config .specification .languageCode |
string Recognition language. See the model description for acceptable values. The default value is ru-RU , Russian. |
config .specification .model |
string Language model to use for recognition. The more accurate your choice of the model, the better the recognition result. You can specify only one model per request. The acceptable values depend on the language you select. The default value is general . |
config .specification .profanityFilter |
boolean Profanity filter. Acceptable values:
|
config .specification .partialResults |
boolean Intermediate result filter. Acceptable values:
|
config .specification .singleUtterance |
boolean Flag disabling recognition after the first utterance. Acceptable values:
|
config .specification .audioEncoding |
string Submitted audio format. Acceptable values:
|
config .specification .sampleRateHertz |
integer (int64) Submitted audio sampling frequency. This parameter is required if format equals LINEAR16_PCM . Acceptable values:
|
config. specification. rawResults |
boolean Flag for how to write numbers: true for words, false (default) for figures. |
folderId | string ID of the folder you have access to. It is required for authentication with a user account (see Authentication with the SpeechKit API). Do not use this field if you make a request on behalf of a service account. The maximum string length is 50 characters. |
Experimental additional recognition settings
For streaming recognition models, new recognition settings are supported. They are passed to a gRPC procedure via metadata.
Parameter | Description |
---|---|
x-normalize-partials |
boolean Flag allowing you to get intermediate recognition results (parts of recognized utterance) in a normalized format: numbers as digits, profanity filter enabled, etc. Acceptable values:
|
Audio message
Parameter | Description |
---|---|
audio_content |
Audio fragment represented as an array of bytes. The audio must match the format specified in the message with recognition settings. |
Message with recognition results
If speech fragment recognition is successful, you will receive a message containing a list of recognition results (chunks[]
). Each result contains the following fields:
-
alternatives[]
: List of recognized text alternatives. Each alternative contains the following fields:text
: Recognized text.confidence
: This field is currently not supported. Do not use it.
-
final
: Flag indicating that this recognition result is final and will not change anymore. If the value isfalse
, it means the recognition result is intermediate and may change as subsequent speech fragments get recognized. -
endOfUtterance
: Flag indicating that this result contains the end of the utterance. If the value istrue
, the new utterance will start with the next result you get.Note
If you set
singleUtterance=true
, only one utterance per session will be recognized. After the message whereendOfUtterance
istrue
, the server will not recognize the following utterances and will wait for you to terminate the session.
Error codes returned by the server
To see how gRPC statuses correspond to HTTP codes, see google.rpc.Code
List of possible gRPC errors returned by the service:
Code | Status | Description |
---|---|---|
3 | INVALID_ARGUMENT |
Incorrect request parameters specified. Detailed information is provided in the details field. |
9 | RESOURCE_EXHAUSTED |
Client exceeded a quota. |
16 | UNAUTHENTICATED |
The operation requires authentication. Check the IAM token and the folder ID that you provided. |
13 | INTERNAL |
Internal server error. This error means that the operation cannot be performed due to a server-side technical problem, e.g., due to insufficient computing resources. |