Asynchronous recognition of OggOpus audio files using the API v2
Here are examples of asynchronous recognition of speech from an audio file using the SpeechKit API v2. These examples use the following parameters:
- Language: Russian.
- Audio stream format: OggOpus with an OPUS file.
- Other parameters are left at their defaults.
You can generate and send a speech recognition request using the cURL
An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.
Getting started
-
Create a service account.
Warning
Please note that you can only recognize audio files asynchronously under a service account. Do not use any other accounts in Yandex Cloud for that.
-
Assign the service account the
storage.uploader
andai.speechkit-stt.user
roles for the folder where you created the bucket. -
Get an IAM token or API key for the created service account.
If you do not have an OggOpus audio file, you can download a sample file
Perform speech recognition via the API
Warning
For two-channel OggOpus audio files, do not specify the number of channels using the audioChannelCount
parameter.
-
Get a link to an audio file in Object Storage.
-
Create a file, e.g.,
body.json
, and add the following code to it:{ "config": { "specification": { "languageCode": "ru-RU" } }, "audio": { "uri": "<link_to_audio_file>" } }
Where:
-
languageCode
: Recognition language. -
uri
: Link to the audio file in Object Storage. Here is an example of such a link:https://storage.yandexcloud.net/speechkit/speech.opus
.The link contains additional query parameters (after
?
) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.
Since OggOpus is the default format, you do not need to specify the audio stream format.
Note
Do not provide the audioChannelCount parameter to specify the number of audio channels. OggOpus files already contain information about the channel count.
-
-
Run the created file:
export IAM_TOKEN=<service_account_IAM_token> && \ curl \ --request POST \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --data "@body.json" \ https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
Where
IAM_TOKEN
is the IAM token of the service account.Result example:
{ "done": false, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:29Z" }
Save the recognition operation
id
you get in the response. -
Wait for the recognition to complete. It takes about 10 seconds to recognize a one-minute audio.
-
Send a request to get information about the operation:
curl \ --header "Authorization: Bearer ${IAM_TOKEN}" \ https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse", "chunks": [ { "alternatives": [ { "text": "your number is 212-85-06", "confidence": 1 } ], "channelTag": "1" } ] }, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:36Z" }
-
Use the pip package
manager to install therequests
package:pip install requests
-
Create a file, e.g.,
test.py
, and add the following code to it:# -*- coding: utf-8 -*- import requests import time import json # Specify your IAM token and the link to the audio file in Object Storage. key = '<service_account_IAM_token>' filelink = '<link_to_audio_file>' POST ='https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize' body ={ "config": { "specification": { "languageCode": "ru-RU" } }, "audio": { "uri": filelink } } header = {'Authorization': 'Bearer {}'.format(key)} # Send a recognition request. req = requests.post(POST, headers=header, json=body) data = req.json() print(data) id = data['id'] # Request the operation status on the server until recognition is complete. while True: time.sleep(1) GET = "https://operation.api.cloud.yandex.net/operations/{id}" req = requests.get(GET.format(id=id), headers=header) req = req.json() if req['done']: break print("Not ready") # Show the full server response in JSON format. print("Response:") print(json.dumps(req, ensure_ascii=False, indent=2)) # Show only text from recognition results. print("Text chunks:") for chunk in req['response']['chunks']: print(chunk['alternatives'][0]['text'])
Where:
key
: Service account IAM token.filelink
: Link to the audio file in Object Storage.
-
Run the created file:
python3 test.py
Troubleshooting
In case you get a null, an empty file, or the Invalid audio received
error, check the codec your file is encrypted with. To recognize a file correctly, changing its extension is not enough.
Use FFmpeg
ffmpeg -i audio.ogg -c:a libopus -b:a 65536 audio_new.ogg