Asynchronous recognition of OggOpus audio files using the API v2
Here are examples of asynchronous recognition of speech from an audio file using the SpeechKit API v2. These examples use the following parameters:
- Language: Russian.
- Audio stream format: OggOpus with an OPUS file.
- Other parameters left by default.
You can generate and send a speech recognition request using the cURL
An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.
Getting started
-
Create a service account.
Warning
Please note that you can only recognize audio files asynchronously under a service account. Do not use any other accounts in Yandex Cloud for that.
-
Assign the service account the
storage.uploader
andai.speechkit-stt.user
roles for the folder where you created the bucket. -
Get an IAM token or API key for the created service account.
If you do not have an OggOpus audio file, you can download a sample file
Perform speech recognition via the API
Warning
For two-channel OggOpus audio files, do not specify the number of channels in the audioChannelCount
parameter.
-
Get a link to an audio file in Object Storage.
-
Create a file, e.g.,
body.json
, and paste the following code to it:{ "config": { "specification": { "languageCode": "ru-RU" } }, "audio": { "uri": "<link_to_audio_file>" } }
Where:
-
languageCode
: Recognition language -
uri
: Link to the audio file in Object Storage, e.g., Sample link:https://storage.yandexcloud.net/speechkit/speech.opus
.The link contains additional query parameters (after
?
) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.
Since OggOpus is the default format, you do not need to specify the audio stream format.
Note
Do not provide the audioChannelCount parameter to specify the number of audio channels. OggOpus files already contain information about the channel count.
-
-
Run the created file:
export IAM_TOKEN=<service_account_IAM_token> && \ curl -X POST \ -H "Authorization: Bearer ${IAM_TOKEN}" \ -d "@body.json" \ https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
Where
IAM_TOKEN
is the IAM token of the service account.Result example:
{ "done": false, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:29Z" }
Save the recognition operation
id
that you received in the response. -
Wait for the recognition to complete. It takes about 10 seconds to recognize one minute of an audio file.
-
Send a request to get information about the operation:
curl -H "Authorization: Bearer ${IAM_TOKEN}" \ https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse", "chunks": [ { "alternatives": [ { "text": "your number is 212-85-06", "confidence": 1 } ], "channelTag": "1" } ] }, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:36Z" }
-
Install the
requests
package using the pip package manager:pip install requests
-
Create a file, e.g.,
test.py
, and paste the following code to it:# -*- coding: utf-8 -*- import requests import time import json # Specify your IAM token and the link to the audio file in Object Storage. key = '<service_account_IAM_token>' filelink = '<link_to_audio_file>' POST ='https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize' body ={ "config": { "specification": { "languageCode": "ru-RU" } }, "audio": { "uri": filelink } } header = {'Authorization': 'Bearer {}'.format(key)} # Send a recognition request. req = requests.post(POST, headers=header, json=body) data = req.json() print(data) id = data['id'] # Request the operation status on the server until recognition is complete. while True: time.sleep(1) GET = "https://operation.api.cloud.yandex.net/operations/{id}" req = requests.get(GET.format(id=id), headers=header) req = req.json() if req['done']: break print("Not ready") # Show the full server response in JSON format. print("Response:") print(json.dumps(req, ensure_ascii=False, indent=2)) # Only show text from recognition results. print("Text chunks:") for chunk in req['response']['chunks']: print(chunk['alternatives'][0]['text'])
Where:
key
: IAM token of the service accountfilelink
: Link to the audio file in Object Storage
-
Run the created file:
python3 test.py
Troubleshooting
In case you get a null, an empty file, or the Invalid audio received
error, check the codec your file is encrypted with. To recognize a file correctly, changing its extension is not enough.
Use FFmpeg
ffmpeg -i audio.ogg -c:a libopus -b:a 65536 audio_new.ogg