Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Audio file streaming recognition, API v3
      • Microphone speech streaming recognition, API v3
      • Automatic language detection, API v3
      • Streaming recognition, API v2
      • Synchronous recognition, API v1
      • Asynchronous recognition of WAV audio files, API v3
      • Asynchronous recognition of LPCM format, API v2
      • Asynchronous recognition of OggOpus format, API v2
      • Regular asynchronous recognition of audio files, API v2
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Getting started
  • Perform speech recognition via the API
  • Troubleshooting
  1. Step-by-step guides
  2. Recognition
  3. Asynchronous recognition of OggOpus format, API v2

Asynchronous recognition of OggOpus audio files using the API v2

Written by
Yandex Cloud
Updated at April 11, 2025
  • Getting started
  • Perform speech recognition via the API
  • Troubleshooting

Here are examples of asynchronous recognition of speech from an audio file using the SpeechKit API v2. These examples use the following parameters:

  • Language: Russian.
  • Audio stream format: OggOpus with an OPUS file.
  • Other parameters are left at their defaults.

You can generate and send a speech recognition request using the cURL utility or a Python script.

An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.

Getting startedGetting started

  1. Create a bucket and upload to it the audio file you want to recognize.

  2. Create a service account.

    Warning

    You can recognize audio files asynchronously only as a service account. Do not use any other accounts Yandex Cloud for the purpose.

  3. Assign to the service account the storage.uploader and ai.speechkit-stt.user roles for the folder you had created the bucket in.

  4. Get an IAM token or API key for the created service account.

If you do not have an OggOpus audio file, you can download a sample file.

Perform speech recognition via the APIPerform speech recognition via the API

Warning

For two-channel OggOpus audio files, do not specify the number of channels using the audioChannelCount parameter.

cURL
Python 3
  1. Get a link to an audio file in Object Storage.

  2. Create a file, e.g., body.json, and add the following code to it:

    {
        "config": {
            "specification": {
                "languageCode": "ru-RU"
            }
        },
        "audio": {
            "uri": "<link_to_audio_file>"
        }
    }
    

    Where:

    • languageCode: Recognition language.

    • uri: Link to the audio file in Object Storage. Here is an example of such a link: https://storage.yandexcloud.net/speechkit/speech.opus.

      The link contains additional query parameters (after ?) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.

    Since OggOpus is the default format, you do not need to specify the audio stream format.

    Note

    Do not provide the audioChannelCount parameter to specify the number of audio channels. OggOpus files already contain information about the channel count.

  3. Run the created file:

    export IAM_TOKEN=<service_account_IAM_token> && \
    curl \
      --request POST \
      --header "Authorization: Bearer ${IAM_TOKEN}" \
      --data "@body.json" \
      https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
    

    Where IAM_TOKEN is the IAM token of the service account.

    Result example:

    {
        "done": false,
        "id": "e03sup6d5h1q********",
        "createdAt": "2019-04-21T22:49:29Z",
        "createdBy": "ajes08feato8********",
        "modifiedAt": "2019-04-21T22:49:29Z"
    }
    

    Save the recognition operation id you get in the response.

  4. Wait for the recognition to complete. It takes about 10 seconds to recognize a one-minute audio.

  5. Send a request to get information about the operation:

    curl \
      --header "Authorization: Bearer ${IAM_TOKEN}" \
      https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
    

    Result example:

    {
     "done": true,
     "response": {
      "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse",
      "chunks": [
       {
        "alternatives": [
         {
          "text": "your number is 212-85-06",
          "confidence": 1
         }
        ],
        "channelTag": "1"
       }
      ]
     },
     "id": "e03sup6d5h1q********",
     "createdAt": "2019-04-21T22:49:29Z",
     "createdBy": "ajes08feato8********",
     "modifiedAt": "2019-04-21T22:49:36Z"
    }
    

    If speech recognition in the provided file fails, the response.chunks section may be missing from the response.

  1. Use the pip package manager to install the requests package:

    pip install requests
    
  2. Create a file, e.g., test.py, and add the following code to it:

    # -*- coding: utf-8 -*-
    
    import requests
    import time
    import json
    
    # Specify your IAM token and the link to the audio file in Object Storage.
    key = '<service_account_IAM_token>'
    filelink = '<link_to_audio_file>'
    
    POST ='https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize'
    
    body ={
        "config": {
            "specification": {
                "languageCode": "ru-RU"
            }
        },
        "audio": {
            "uri": filelink
        }
    }
    
    header = {'Authorization': 'Bearer {}'.format(key)}
    
    # Send a recognition request.
    req = requests.post(POST, headers=header, json=body)
    data = req.json()
    print(data)
    
    id = data['id']
    
    # Request the operation status on the server until recognition is complete.
    while True:
    
        time.sleep(1)
    
        GET = "https://operation.api.cloud.yandex.net/operations/{id}"
        req = requests.get(GET.format(id=id), headers=header)
        req = req.json()
    
        if req['done']: break
        print("Not ready")
    
    # Show the full server response in JSON format.
    print("Response:")
    print(json.dumps(req, ensure_ascii=False, indent=2))
    
    # Show only text from recognition results.
    print("Text chunks:")
    for chunk in req['response']['chunks']:
        print(chunk['alternatives'][0]['text'])
    

    Where:

    • key: Service account IAM token.
    • filelink: Link to the audio file in Object Storage.
  3. Run the created file:

    python3 test.py
    

TroubleshootingTroubleshooting

In case you get a null, an empty file, or the Invalid audio received error, check the codec your file is encrypted with. To recognize a file correctly, changing its extension is not enough.

Use FFmpeg to convert an OGG file into the supported OggOpus format:

ffmpeg -i audio.ogg -c:a libopus -b:a 65536 audio_new.ogg

See alsoSee also

  • Asynchronous recognition API v2
  • Asynchronous recognition of LPCM audio files using the API v2
  • Regular asynchronous recognition of audio files from Yandex Object Storage
  • Authentication with the SpeechKit API

Was the article helpful?

Previous
Asynchronous recognition of LPCM format, API v2
Next
Regular asynchronous recognition of audio files, API v2
Yandex project
© 2025 Yandex.Cloud LLC