Asynchronous recognition
Asynchronous recognition helps convert multi-channel audio files with the following properties into text:
- Maximum recording duration: 4 hours
- Maximum file size: 1 GB
Recognition results are saved on the 3 days server, after which you cannot get them.
Asynchronous recognition cannot be used for real-time dialog recognition. If you need intermediate results and minimum response time, use streaming recognition.
View the list of supported languages in Supported recognition languages.
Asynchronous recognition modes
For asynchronous recognition, a language model that can operate in two modes is available:
- In standard mode, recognition is processed in a standard priority queue. This mode works when the
general
model is selected. - In deferred mode, the audio file to be recognized is placed in a low priority queue and processed at the least busy time. Special pricing applies to deferred recognition. The time required to process an audio file in deferred mode is 24 hours or less. Recognition in deferred mode is available when the
deferred-general
tag is specified.
How to asynchronously recognize pre-recorded audio
The SpeechKit API v2 and v3 are used for asynchronous speech recognition. To recognize pre-recorded audio:
-
Assign the following roles to it:
ai.speechkit-stt.user
for speech recognition.storage.uploader
for uploading audio files to an Yandex Object Storage bucket.- (Optional)
storage.configurer
,kms.keys.encrypter
, andkms.keys.decrypter
for bucket object encryption and decryption. These roles are only required if you use encryption in Object Storage.
-
Obtain an IAM token or an API key for your service account. You will use them to authenticate to the API.
-
Get a link to the uploaded file.
The link contains additional query parameters (after
?
) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored. -
Send an API request to recognize a file via the gRPC API or REST API. In the body of the request, provide the link to the audio file. In the HTTP header, specify your authentication credentials:
Authorization: Bearer <IAM_token>
for authentication with an IAM token.Authorization: Api-Key <API_key>
for authentication with an API key.
The response to the request returns the ID of the recognition operation. Save it: you will need it for the next request.
Warning
The recognition results are stored on the 3 days server. You can then request the recognition results using the obtained ID.
-
Wait for the recognition to complete. It takes about 10 seconds to recognize one minute of single-channel audio.
-
Send an API request to get the recognition results:
Specify the same authentication credentials in the HTTP header.
The results contain the entire recognized text and a list of recognized words.