Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Audio file streaming recognition, API v3
      • Microphone speech streaming recognition, API v3
      • Automatic language detection, API v3
      • Streaming recognition, API v2
      • Synchronous recognition, API v1
      • Asynchronous recognition of WAV audio files, API v3
      • Asynchronous recognition of LPCM format, API v2
      • Asynchronous recognition of OggOpus format, API v2
      • Regular asynchronous recognition of audio files, API v2
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Getting started
  • Perform speech recognition via the API
  1. Step-by-step guides
  2. Recognition
  3. Asynchronous recognition of LPCM format, API v2

Asynchronous recognition of LPCM audio files using the API v2

Written by
Yandex Cloud
Updated at April 11, 2025
  • Getting started
  • Perform speech recognition via the API

Below is an example of asynchronous recognition of speech from an audio file using the SpeechKit API v2 . This example uses the following parameters:

  • Language: Russian.
  • Language model: general.
  • Format of the submitted audio: LPCM with a sampling rate of 8000 Hz.
  • Number of audio channels: 1 (default).
  • Other parameters are left at their defaults.

You can generate and send a speech recognition request using cURL.

An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.

Getting startedGetting started

  1. Create a bucket and upload to it the audio file you want to recognize.

  2. Create a service account.

    Warning

    You can recognize audio files asynchronously only as a service account. Do not use any other accounts Yandex Cloud for the purpose.

  3. Assign to the service account the storage.uploader and ai.speechkit-stt.user roles for the folder you had created the bucket in.

  4. Get an IAM token or API key for the created service account.

If you do not have an LPCM audio file, you can download a sample file.

Perform speech recognition via the APIPerform speech recognition via the API

cURL
  1. Get a link to an audio file in Object Storage.

  2. Create a file named body.json and add the following code to it:

    {
       "config": {
          "specification": {
             "languageCode": "ru-RU",
             "model": "general",
             "audioEncoding": "LINEAR16_PCM",
             "sampleRateHertz": 8000,
             "audioChannelCount": 1
          }
       },
       "audio": {
          "uri": "<link_to_audio_file>"
       }
    }
    

    Where:

    • languageCode: Recognition language.

    • model: Speech recognition model.

    • audioEncoding: Format of the submitted audio file.

    • sampleRateHertz: Audio file sampling rate in Hz.

    • audioChannelCount: Number of audio channels.

    • uri: Link to the audio file in Object Storage. Here is an example of such a link: https://storage.yandexcloud.net/speechkit/speech.pcm.

      The link contains additional query parameters (after ?) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.

  3. Run the created file:

    export API_KEY=<service_account_API_key> && \
    curl \
      --insecure \
      --header "Authorization: Api-Key ${API_KEY}" \
      --data "@body.json"\
      https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
    

    Result example:

    {
       "done": false,
       "id": "e03sup6d5h1q********",
       "createdAt": "2019-04-21T22:49:29Z",
       "createdBy": "ajes08feato8********",
       "modifiedAt": "2019-04-21T22:49:29Z"
    }
    

    Save the recognition operation id you get in the response.

  4. Wait until the recognition is completed. It takes about 10 seconds to recognize one minute of single-channel audio.

  5. Send a request to get information about the operation:

    curl \
      --insecure \
      --header "Authorization: Api-key ${API_KEY}" \
      https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
    

    Result example:

    {
       "done": true,
       "response": {
          "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse",
          "chunks": [
             {
                "alternatives": [
                   {
                      "words": [
                         {
                            "startTime": "0.160s",
                            "endTime": "0.500s",
                            "word": "hello",
                            "confidence": 1
                         },
                         {
                            "startTime": "0.580s",
                            "endTime": "0.800s",
                            "word": "world",
                            "confidence": 1
                         }
                      ],
                      "text": "Hello world",
                      "confidence": 1
                   }
                ],
                "channelTag": "1"
             }
          ]
       },
       "id": "e03jjenu23uc********",
       "createdAt": "2024-08-22T11:39:22Z",
       "createdBy": "aje3bg430agh********",
       "modifiedAt": "2024-08-22T11:39:23Z"
    }
    

    If speech recognition in the provided file fails, the response.chunks section may be missing from the response.

See alsoSee also

  • Asynchronous recognition API v2
  • Asynchronous recognition of OggOpus audio files using the API v2
  • Regular asynchronous recognition of audio files from Yandex Object Storage
  • Authentication with the SpeechKit API

Was the article helpful?

Previous
Asynchronous recognition of WAV audio files, API v3
Next
Asynchronous recognition of OggOpus format, API v2
© 2025 Direct Cursus Technology L.L.C.