API v2 for streaming recognition
The service is located at: stt.api.cloud.yandex.net:443
Message with recognition settings
Parameter | Description |
---|---|
config | object Field with the recognition settings and folder ID |
config .specification |
object Recognition settings |
config .specification .languageCode |
string Language that recognition will be performed for. See a list of available languages in the model description. The default value is ru-RU (Russian). |
config .specification .model |
string Language model to use for recognition. The closer the model is matched, the better is the recognition result. You can only specify one model per request. Acceptable values depend on the selected language. The default value is general . |
config .specification .profanityFilter |
boolean Profanity filter. Acceptable values include:
|
config .specification .partialResults |
boolean Filter intermediate results. Acceptable values include:
|
config .specification .singleUtterance |
boolean Flag that disables recognition after the first utterance. Acceptable values include:
|
config .specification .audioEncoding |
string Format of the audio being provided. Acceptable values include:
|
config .specification .sampleRateHertz |
integer (int64) Sampling frequency of the audio being provided. This parameter is required if format is set to LINEAR16_PCM . Acceptable values include:
|
config. specification. rawResults |
boolean Flag that toggles spelling out numbers. true : Spell out. false (default): Write as numbers. |
folderId | string ID of the folder that you have access to. It is required for authentication with a user account (see Authentication with the SpeechKit API). Do not use this field if you make a request on behalf of a service account. The maximum string length is 50 characters. |
Experimental additional recognition settings
For streaming recognition models, new recognition settings are supported. They are passed to a gRPC procedure via metadata.
Parameter | Description |
---|---|
x-normalize-partials |
boolean Flag that allows you to get intermediate recognition results (parts of a recognized utterance) in a normalized format: numbers are specified as digits, the profanity filter is enabled, etc. Acceptable values include:
|
Audio message
Parameter | Description |
---|---|
audio_content |
Audio fragment represented as an array of bytes. The audio must match the format specified in the message with recognition settings. |
Message with recognition results
If speech fragment recognition is successful, you will receive a message containing a list of recognition results chunks[]
. Each result contains the following fields:
-
alternatives[]
: List of recognized text alternatives. Each alternative contains the following fields:text
: Recognized textconfidence
: This field is currently not supported. Do not use it.
-
final
: Flag indicating that this recognition result is final and will not change anymore. If the value isfalse
, it means that the recognition result is intermediate and may change as the next speech fragments are recognized. -
endOfUtterance
: Flag indicating that this result contains the ending of the utterance. If the value istrue
, the new utterance will start with the next result obtained.Note
If you specified
singleUtterance=true
in the settings, only one utterance will be recognized per session. After sending a message whereendOfUtterance
istrue
, the server does not recognize the following utterances and waits until you end the session.
Error codes returned by the server
To see how gRPC statuses correspond to HTTP codes, see google.rpc.Code
List of possible gRPC errors returned by the service:
Code | Status | Description |
---|---|---|
3 | INVALID_ARGUMENT |
Incorrect request parameters specified. Details are provided in the details field. |
9 | RESOURCE_EXHAUSTED |
Client exceeded a quota. |
16 | UNAUTHENTICATED |
The operation requires authentication. Check the IAM token and the folder ID that you provided. |
13 | INTERNAL |
Internal server error. This error means that the operation cannot be performed due to a server-side technical problem, e.g., due to insufficient computing resources. |