Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • API authentication
        • Streaming Recognition API
        • Asynchronous recognition API
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Sending a file for recognition
  • Request body parameters
  • Response
  • Getting recognition results
  • Path parameters
  • Response
  • Use cases
  1. API references
  2. Recognition
  3. API v2
  4. Asynchronous recognition API

Asynchronous recognition API v2

Written by
Yandex Cloud
Updated at April 24, 2025
  • Sending a file for recognition
    • Request body parameters
    • Response
  • Getting recognition results
    • Path parameters
    • Response
  • Use cases

To use the API v2, you will need:

  • Yandex Object Storage bucket to which you will upload your audio file for recognition.
  • Service account with the ai.speechkit-stt.user and storage.uploader roles for accessing SpeechKit and Object Storage.
  • IAM token or API key for authentication.

For more information on getting started, see How to asynchronously recognize pre-recorded audio.

Warning

You can recognize audio files asynchronously only as a service account. Do not use any other Yandex Cloud accounts for the purpose.

The asynchronous recognition service for the API v2 is located at transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize

Sending a file for recognitionSending a file for recognition

Request body parametersRequest body parameters

The request body structure is as follows:

{
 "config": {
  "specification": {
   "languageCode": "string",
   "model": "string",
   "profanityFilter": boolean,
   "literature_text": boolean,
   "audioEncoding": "string",
   "sampleRateHertz": integer,
   "audioChannelCount": integer,
   "rawResults": boolean
  }
 },
 "audio": {
  "uri": "string"
 }
}

Parameter

Description

config

object
Field with recognition settings.

config.
specification

object
Recognition settings.

config.
specification.
languageCode

string
Language of the audio file for speech recognition.
The default value is ru-RU, Russian.

config.
specification.
model

string
Language model for speech recognition.
The default value is general.
Different models have different pricing.

config.
specification.
profanityFilter

boolean
Profanity filter.
Acceptable values:

  • true: Mask profanities with asterisks in recognition results.
  • false (default): Do not mask profanities.

config.
specification.
literature_text

boolean
Enables normalization mode.

config.
specification.
audioEncoding

string
Submitted audio format.
Acceptable values:

  • LINEAR16_PCM: LPCM without a WAV header.
  • OGG_OPUS (default): Ogg with the OPUS codec.
  • MP3: MP3.

config.
specification.
sampleRateHertz

integer (int64)
Sampling rate of the submitted audio.
This parameter is required if format is set to LINEAR16_PCM. Acceptable values:

  • 48000 (default): 48 kHz.
  • 16000: 16 kHz.
  • 8000: Sampling rate of 8 kHz.

config.
specification.
audioChannelCount

integer (int64)
Number of channels for LPCM audio files. The default value is 1.
Do not use this field for OggOpus or MP3 audio files. They already contain information about the channel count.

config.
specification.
rawResults

boolean
Flag that toggles spelling out numbers.
Acceptable values:

  • true: Spell out.
  • false (default): Use figures.

audio.
uri

string
URI of the audio file for recognition. Supports only links to files stored in Yandex Object Storage.

ResponseResponse

If your request is written correctly, the service returns the Operation object with the recognition operation ID (id):

{
 "done": false,
 "id": "e03sup6d5h1q********",
 "createdAt": "2019-04-21T22:49:29Z",
 "createdBy": "ajes08feato8********",
 "modifiedAt": "2019-04-21T22:49:29Z"
}

Use this ID at the next step.

Getting recognition resultsGetting recognition results

To check the operation status and get the recognition results, submit a request at operation.api.cloud.yandex.net.

Monitor the recognition results using the obtained ID. The number of result monitoring requests is limited: it takes about 10 seconds to recognize 1 minute of single-channel audio.

Warning

Recognition results are stored on the 3 days server. You can then request the recognition results using the obtained ID.

Path parametersPath parameters

Parameter Description
operationId Operation ID received when sending the recognition request

ResponseResponse

The Operation object is returned in response to your request. Here is a response example:

{
 "done": true,
 "response": {
  "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse",
  "chunks": [
   {
    "alternatives": [
     {
      "words": [
       {
        "startTime": "0.879999999s",
        "endTime": "1.159999992s",
        "word": "when",
        "confidence": 1
       },
       {
        "startTime": "1.219999995s",
        "endTime": "1.539999988s",
        "word": "writing",
        "confidence": 1
       },
       ...
      ],
      "text": "when writing The Hobbit, Tolkien referred to the Norse mythology of the Old English poem Beowulf",
      "confidence": 1
     }
    ],
    "channelTag": "1"
   },
   ...
  ]
 },
 "id": "e03sup6d5h1q********",
 "createdAt": "2019-04-21T22:49:29Z",
 "createdBy": "ajes08feato8********",
 "modifiedAt": "2019-04-21T22:49:36Z"
}

Parameter

Description

done

boolean
Contains true when the recognition is complete.

response

object
Asynchronous speech recognition results

response.
@type

string
Response type

response.
chunks

array
Array with recognition results
If speech recognition in the transmitted file fails, the response may not contain an array with the results.

response.
chunks.
alternatives

array
Array with recognized text alternatives

response.
chunks.
alternatives.
words

array
Array with recognized words and their details

response.
chunks.
alternatives.
words.
startTime

string
Word start time in the recording. An error of 1-2 seconds is possible.

response.
chunks.
alternatives.
words.
endTime

string
Word end time in the recording. An error of 1-2 seconds is possible.

response.
chunks.
alternatives.
words.
word

string
Recognized word. Recognized numbers are spelled out (e.g., twelve instead of 12).

response.
chunks.
alternatives.
words.
confidence

integer (int64)
This field is not supported. Do not use it.

response.
chunks.
alternatives.
text

string
Entire recognized text. By default, numbers are written in figures. To output the entire text in word form, set the config.specification.rawResult parameter to true.

response.
chunks.
alternatives.
confidence

integer (int64)
This field is not supported. Do not use it.

response.
chunks.
channelTag

string
Audio channel recognition was performed for.

id

string
Operation ID. Generated on the service side.

createdAt

google.protobuf.Timestamp
Operation start time. Uses RFC3339 (Timestamps) format.

createdBy

string
ID of the user who started the operation.

modifiedAt

google.protobuf.Timestamp
Resource last update time. Uses RFC3339 (Timestamps) format.

For more information about the response format and codes, see Response status codes.

Use casesUse cases

  • Asynchronous recognition of LPCM audio files using the API v2.
  • Asynchronous recognition of OggOpus audio files using the API v2.
  • Regular asynchronous recognition of audio files from Yandex Object Storage.

Was the article helpful?

Previous
Streaming Recognition API
Next
Overview
© 2025 Direct Cursus Technology L.L.C.