Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • Supported languages
    • Streaming recognition
      • Synchronous recognition
      • Asynchronous recognition
    • Recognition result normalization
    • Analyzing recognition results
    • Speaker labeling
    • Extending a speech recognition model
    • Uploading fine-tuning data for a speech recognition model
    • Detecting the end of utterance
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Asynchronous recognition modes
  • How to asynchronously recognize pre-recorded audio
  • Use cases
  1. Speech recognition
  2. Audio file recognition
  3. Asynchronous recognition

Asynchronous recognition

Written by
Yandex Cloud
Improved by
Updated at April 30, 2025
  • Asynchronous recognition modes
  • How to asynchronously recognize pre-recorded audio
  • Use cases

Asynchronous recognition helps convert multi-channel audio files with the following properties into text:

  • Maximum recording duration: 4 hours
  • Maximum file size: 1 GB

Recognition results are saved on the 3 days server, after which you cannot get them.

Asynchronous recognition cannot be used for real-time dialog recognition. If you need intermediate results and minimum response time, use streaming recognition.

View the list of supported languages in Supported recognition languages.

Asynchronous recognition modesAsynchronous recognition modes

Asynchronous recognition is available in two modes:

  1. In standard mode, recognition is processed in a standard priority queue. This mode works when the general model is selected.
  2. In deferred mode, the audio file for recognition is placed in a low priority queue and processed at the least busy time. Special pricing applies to deferred recognition. For recognition in deferred mode, specify the deferred-general model.

Asynchronous recognition of audio files takes no more than 24 hours, depending on the current system workload.

How to asynchronously recognize pre-recorded audioHow to asynchronously recognize pre-recorded audio

The SpeechKit API v2 and v3 are used for asynchronous speech recognition. To recognize pre-recorded audio:

  1. Create a service account.

  2. Assign the following roles to it:

    • ai.speechkit-stt.user for speech recognition.
    • storage.uploader for uploading audio files to an Yandex Object Storage bucket.
    • (Optional) storage.configurer, kms.keys.encrypter, and kms.keys.decrypter for bucket object encryption and decryption. These roles are only required if you use encryption in Object Storage.
  3. Obtain an IAM token or an API key for your service account. You will use them to authenticate to the API.

  4. Create an Yandex Object Storage bucket.

  5. Upload an audio file to the bucket.

  6. Get a link to the uploaded file.

    The link contains additional query parameters (after ?) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.

  7. Send an API request to recognize a file via the gRPC API or REST API. In the body of the request, provide the link to the audio file. In the HTTP header, specify your authentication credentials:

    • Authorization: Bearer <IAM_token>: IAM token used for authentication.
    • Authorization: Api-Key <API_key>: For authentication with the API key.

    The response to the request returns the ID of the recognition operation. Save it: you will need it for the next request.

    Warning

    The recognition results are stored on the 3 days server. You can then request the recognition results using the obtained ID.

  8. Wait for the recognition to complete. It takes about 10 seconds to recognize one minute of single-channel audio.

  9. Send an API request to get the recognition results:

    • Using the API v2
    • Using the API v3:
      • REST
      • gRPC

    Specify the same authentication credentials in the HTTP header.

    The results contain the entire recognized text and a list of recognized words.

Use casesUse cases

  • Asynchronous recognition of LPCM audio files using the API v2
  • Asynchronous recognition of OggOpus audio files using the API v2
  • Asynchronous WAV audio file recognition using the API v3
  • Regular asynchronous recognition of audio files from Yandex Object Storage

Was the article helpful?

Previous
Synchronous recognition
Next
Recognition result normalization
© 2025 Direct Cursus Technology L.L.C.