Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • How to recognize short audio files in the API v1
    • How to recognize long audio files in the API v3 and v2
    • How to synthesize speech in the API v1
    • How to synthesize speech in the API v3
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  • Audit Trails events

In this article:

  • Getting started
  • Speech recognition
  1. Getting started
  2. How to recognize long audio files in the API v3 and v2

How to recognize long audio files in SpeechKit

Written by
Yandex Cloud
Updated at October 20, 2025
  • Getting started
  • Speech recognition

The service can recognize speech in different ways. The provided example demonstrates asynchronous recognition of an audio file. Asynchronous recognition is available via API v3 and API v2. Asynchronous recognition is subject to these restrictions:

  • Maximum audio duration: 4 hours
  • Maximum file size: 1 GB

In the example, the API is used via the cURL utility. If you want to use the API via a Python script, see the relevant tutorials.

Getting startedGetting started

  1. Create a bucket and upload to it the audio file you want to recognize.

  2. Create a service account.

    Warning

    You can recognize audio files asynchronously only as a service account. Do not use any other Yandex Cloud accounts for the purpose.

  3. Assign to the service account the storage.uploader and ai.speechkit-stt.user roles for the folder you had created the bucket in.

  4. Get an API key or IAM token for your service account.

  5. Download a sample audio file:

    • For API v3: a WAV file.
    • For API v2: an LPCM file.

Speech recognitionSpeech recognition

API v3
API v2
  1. Get a link to an audio file in Object Storage.

  2. Create a file, e.g., request.json, and add the following code to it:

    {
      "uri": "https://storage.yandexcloud.net/<bucket_name>/<path_to_WAV_file_in_bucket>",
      "recognition_model": {
        "model": "general",
        "audio_format": {
          "container_audio": {
            "container_audio_type": "WAV"
          }
        }
      }
    }
    

    Where:

    • uri: Link to the audio file in Object Storage. Here is an example of such a link: https://storage.yandexcloud.net/speechkit/speech.wav.

      The link contains additional query parameters (after ?) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.

    • model: Speech recognition model.

    • container_audio_type: Audio container format.

  3. Run the request using one of the service account authentication methods:

    • With an IAM token:

      export FOLDER_ID=<folder_ID>
      export IAM_TOKEN=<service_account_IAM_token> && \
      curl \
        --insecure \
        --header "Authorization: Bearer ${IAM_TOKEN}" \
        --header "x-folder-id: ${FOLDER_ID}" \
        --data @request.json https://stt.api.cloud.yandex.net:443/stt/v3/recognizeFileAsync
      

      Where:

      • FOLDER_ID: ID of the folder your service account was created in.
      • IAM_TOKEN: Service account IAM token.
    • With an API key.

      Use API keys if requesting an IAM token automatically is not an option.

      export FOLDER_ID=<folder_ID>
      export API_KEY=<service_account_API_key> && \
      curl \
        --insecure \
        --header "Authorization: Api-Key ${API_KEY}" \
        --header "x-folder-id: ${FOLDER_ID}" \
        --data @request.json https://stt.api.cloud.yandex.net:443/stt/v3/recognizeFileAsync
      

    Result example:

    {
       "id":"f8ddr61b30fk********",
       "description":"STT v3 async recognition",
       "createdAt":"2024-07-15T07:39:36Z",
       "createdBy":"ajehumcuv38h********",
       "modifiedAt":"2024-07-15T07:39:36Z",
       "done":false,
       "metadata":null
    }
    

    Save the recognition operation id you get in the response.

  4. Wait until the recognition is complete. It takes about 10 seconds to recognize one minute of audio.

  5. Request information about the operation:

    • Authentication with an IAM token:

      curl \
        --insecure \
        --request GET \
        --header "Authorization: Bearer ${IAM_TOKEN}" \
        --header "x-folder-id: ${FOLDER_ID}" \
        https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
      
    • Authentication with an API key:

      curl \
        --insecure \
        --request GET \
        --header "Authorization: Api-key ${API_KEY}" \
        --header "x-folder-id: ${FOLDER_ID}" \
        https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
      

    Result example:

    {
       "done": true,
       "id": "f8ddr61b30fk********",
       "description": "STT v3 async recognition",
       "createdAt": "2024-07-15T07:39:36Z",
       "createdBy": "ajehumcuv38h********",
       "modifiedAt": "2024-07-15T07:39:37Z"
    }
    
  6. Request the operation result:

    • Authentication with an IAM token:

      curl \
        --insecure \
        --request GET \
        --header "Authorization: Bearer ${IAM_TOKEN}" \
        --header "x-folder-id: ${FOLDER_ID}" \
        https://stt.api.cloud.yandex.net:443/stt/v3/getRecognition?operation_id=<recognition_operation_ID>
      
    • Authentication with an API key:

      curl \
        --insecure \
        --request GET \
        --header "Authorization: Api-key ${API_KEY}" \
        --header "x-folder-id: ${FOLDER_ID}" \
        https://stt.api.cloud.yandex.net:443/stt/v3/getRecognition?operation_id=<recognition_operation_ID>
      
    Result example
    {
     "result": {
        "sessionUuid": {
           "uuid": "24935f24-2c1f62dc-8dd49006-********",
           "userRequestId": "f8d2h7m07t4i********"
        },
        "audioCursors": {
           "receivedDataMs": "7400",
           "resetTimeMs": "0",
           "partialTimeMs": "7400",
           "finalTimeMs": "7400",
           "finalIndex": "0",
           "eouTimeMs": "0"
        },
        "responseWallTimeMs": "189",
        "final": {
           "alternatives": [
              {
                 "words": [
                    {
                       "text": "я",
                       "startTimeMs": "459",
                       "endTimeMs": "520"
                    },
                    {
                       "text": "яндекс",
                       "startTimeMs": "640",
                       "endTimeMs": "1060"
                    },
                    {
                       "text": "спичкит",
                       "startTimeMs": "1120",
                       "endTimeMs": "1959"
                    },
                    {
                       "text": "я",
                       "startTimeMs": "2480",
                       "endTimeMs": "2520"
                    },
                    {
                       "text": "могу",
                       "startTimeMs": "2580",
                       "endTimeMs": "2800"
                    },
                    {
                       "text": "превратить",
                       "startTimeMs": "2860",
                       "endTimeMs": "3360"
                    },
                    {
                       "text": "любой",
                       "startTimeMs": "3439",
                       "endTimeMs": "3709"
                    },
                    {
                       "text": "текст",
                       "startTimeMs": "3800",
                       "endTimeMs": "4140"
                    },
                    {
                       "text": "в",
                       "startTimeMs": "4200",
                       "endTimeMs": "4220"
                    },
                    {
                       "text": "речь",
                       "startTimeMs": "4279",
                       "endTimeMs": "4740"
                    },
                    {
                       "text": "теперь",
                       "startTimeMs": "5140",
                       "endTimeMs": "5759"
                    },
                    {
                       "text": "и",
                       "startTimeMs": "5859",
                       "endTimeMs": "5900"
                    },
                    {
                       "text": "вы",
                       "startTimeMs": "5980",
                       "endTimeMs": "6399"
                    },
                    {
                       "text": "можете",
                       "startTimeMs": "6660",
                       "endTimeMs": "7180"
                    }
                 ],
                 "text": "я яндекс спичкит я могу превратить любой текст в речь теперь и вы можете",
                 "startTimeMs": "0",
                 "endTimeMs": "7400",
                 "confidence": 0,
                 "languages": []
              }
           ],
           "channelTag": "0"
        },
        "channelTag": "0"
     }
    }
    {
     "result": {
        "sessionUuid": {
           "uuid": "24935f24-2c1f62dc-8dd49006-********",
           "userRequestId": "f8d2h7m07t4i********"
        },
        "audioCursors": {
           "receivedDataMs": "7400",
           "resetTimeMs": "0",
           "partialTimeMs": "7400",
           "finalTimeMs": "7400",
           "finalIndex": "0",
           "eouTimeMs": "0"
        },
        "responseWallTimeMs": "189",
        "finalRefinement": {
           "finalIndex": "0",
           "normalizedText": {
              "alternatives": [
                 {
                    "words": [
                       {
                          "text": "я",
                          "startTimeMs": "459",
                          "endTimeMs": "520"
                       },
                       {
                          "text": "яндекс",
                          "startTimeMs": "640",
                          "endTimeMs": "1060"
                       },
                       {
                          "text": "спичкит",
                          "startTimeMs": "1120",
                          "endTimeMs": "1959"
                       },
                       {
                          "text": "я",
                          "startTimeMs": "2480",
                          "endTimeMs": "2520"
                       },
                       {
                          "text": "могу",
                          "startTimeMs": "2580",
                          "endTimeMs": "2800"
                       },
                       {
                          "text": "превратить",
                          "startTimeMs": "2860",
                          "endTimeMs": "3360"
                       },
                       {
                          "text": "любой",
                          "startTimeMs": "3439",
                          "endTimeMs": "3709"
                       },
                       {
                          "text": "текст",
                          "startTimeMs": "3800",
                          "endTimeMs": "4140"
                       },
                       {
                          "text": "в",
                          "startTimeMs": "4200",
                          "endTimeMs": "4220"
                       },
                       {
                          "text": "речь",
                          "startTimeMs": "4279",
                          "endTimeMs": "4740"
                       },
                       {
                          "text": "теперь",
                          "startTimeMs": "5140",
                          "endTimeMs": "5759"
                       },
                       {
                          "text": "и",
                          "startTimeMs": "5859",
                          "endTimeMs": "5900"
                       },
                       {
                          "text": "вы",
                          "startTimeMs": "5980",
                          "endTimeMs": "6399"
                       },
                       {
                          "text": "можете",
                          "startTimeMs": "6660",
                          "endTimeMs": "7180"
                       }
                    ],
                    "text": "Я яндекс спичкит я могу превратить любой текст в речь теперь и вы можете",
                    "startTimeMs": "0",
                    "endTimeMs": "7400",
                    "confidence": 0,
                    "languages": []
                 }
              ],
              "channelTag": "0"
           }
        },
        "channelTag": "0"
     }
    }
    {
     "result": {
        "sessionUuid": {
           "uuid": "24935f24-2c1f62dc-8dd49006-********",
           "userRequestId": "f8d2h7m07t4i********"
        },
        "audioCursors": {
           "receivedDataMs": "7400",
           "resetTimeMs": "0",
           "partialTimeMs": "7400",
           "finalTimeMs": "7400",
           "finalIndex": "0",
           "eouTimeMs": "7400"
        },
        "responseWallTimeMs": "190",
        "eouUpdate": {
           "timeMs": "7400"
        },
        "channelTag": "0"
     }
    }
    
  1. Get a link to an audio file in Object Storage.

  2. Create a file named body.json and add the following code to it:

    {
       "config": {
          "specification": {
             "languageCode": "ru-RU",
             "model": "general",
             "audioEncoding": "LINEAR16_PCM",
             "sampleRateHertz": 8000,
             "audioChannelCount": 1
          }
       },
       "audio": {
          "uri": "<link_to_audio_file>"
       }
    }
    

    Where:

    • languageCode: Recognition language.

    • model: Speech recognition model.

    • audioEncoding: Format of the submitted audio file.

    • sampleRateHertz: Audio file sampling rate in Hz.

    • audioChannelCount: Number of audio channels.

    • uri: Link to the audio file in Object Storage. Here is an example of such a link: https://storage.yandexcloud.net/speechkit/speech.pcm.

      The link contains additional query parameters (after ?) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.

  3. Run the created file:

    export API_KEY=<service_account_API_key> && \
    curl \
      --insecure \
      --header "Authorization: Api-Key ${API_KEY}" \
      --data "@body.json"\
      https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
    

    Result example:

    {
       "done": false,
       "id": "e03sup6d5h1q********",
       "createdAt": "2019-04-21T22:49:29Z",
       "createdBy": "ajes08feato8********",
       "modifiedAt": "2019-04-21T22:49:29Z"
    }
    

    Save the recognition operation id you get in the response.

  4. Wait until the recognition is completed. It takes about 10 seconds to recognize one minute of single-channel audio.

  5. Send a request to get information about the operation:

    curl \
      --insecure \
      --header "Authorization: Api-key ${API_KEY}" \
      https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
    

    Result example:

    {
       "done": true,
       "response": {
          "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse",
          "chunks": [
             {
                "alternatives": [
                   {
                      "words": [
                         {
                            "startTime": "0.160s",
                            "endTime": "0.500s",
                            "word": "hello",
                            "confidence": 1
                         },
                         {
                            "startTime": "0.580s",
                            "endTime": "0.800s",
                            "word": "world",
                            "confidence": 1
                         }
                      ],
                      "text": "Hello world",
                      "confidence": 1
                   }
                ],
                "channelTag": "1"
             }
          ]
       },
       "id": "e03jjenu23uc********",
       "createdAt": "2024-08-22T11:39:22Z",
       "createdBy": "aje3bg430agh********",
       "modifiedAt": "2024-08-22T11:39:23Z"
    }
    

    If speech recognition in the provided file fails, the response.chunks section may be missing from the response.

See alsoSee also

  • Speech recognition
  • API v3 for asynchronous recognition
  • Asynchronous recognition API v2
  • Authentication with the SpeechKit API
  • Asynchronous WAV audio file recognition using the API v3
  • Asynchronous recognition of OggOpus audio files using the API v2

Was the article helpful?

Previous
How to recognize short audio files in the API v1
Next
How to synthesize speech in the API v1
© 2025 Direct Cursus Technology L.L.C.