Asynchronous recognition of LPCM audio files using the API v2
Below is an example of asynchronous recognition of speech from an audio file using the SpeechKit API v2 . This example uses the following parameters:
- Language: Russian.
- Language model:
general
. - Format of the submitted audio: LPCM with a sampling rate of 8000 Hz.
- Number of audio channels: 1 (default).
- Other parameters are left at their defaults.
You can generate and send a speech recognition request using cURL
An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.
Getting started
-
Create a service account.
Warning
Please note that you can only recognize audio files asynchronously under a service account. Do not use any other accounts in Yandex Cloud for that.
-
Assign the service account the
storage.uploader
andai.speechkit-stt.user
roles for the folder where you created the bucket. -
Get an IAM token or API key for the created service account.
If you do not have an LPCM audio file, you can download a sample file
Perform speech recognition via the API
-
Get a link to an audio file in Object Storage.
-
Create a file named
body.json
and add the following code to it:{ "config": { "specification": { "languageCode": "ru-RU", "model": "general", "audioEncoding": "LINEAR16_PCM", "sampleRateHertz": 8000, "audioChannelCount": 1 } }, "audio": { "uri": "<link_to_audio_file>" } }
Where:
-
languageCode
: Recognition language. -
model
: Speech recognition model. -
audioEncoding
: Format of the submitted audio file. -
sampleRateHertz
: Audio file sampling rate in Hz. -
audioChannelCount
: Number of audio channels. -
uri
: Link to the audio file in Object Storage. Here is an example of such a link:https://storage.yandexcloud.net/speechkit/speech.pcm
.The link contains additional query parameters (after
?
) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.
-
-
Run the created file:
export API_KEY=<service_account_API_key> && \ curl \ --insecure \ --header "Authorization: Api-Key ${API_KEY}" \ --data "@body.json"\ https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
Result example:
{ "done": false, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:29Z" }
Save the recognition operation
id
you get in the response. -
Wait until the recognition is completed. It takes about 10 seconds to recognize one minute of single-channel audio.
-
Send a request to get information about the operation:
curl \ --insecure \ --header "Authorization: Api-key ${API_KEY}" \ https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse", "chunks": [ { "alternatives": [ { "words": [ { "startTime": "0.160s", "endTime": "0.500s", "word": "hello", "confidence": 1 }, { "startTime": "0.580s", "endTime": "0.800s", "word": "world", "confidence": 1 } ], "text": "Hello world", "confidence": 1 } ], "channelTag": "1" } ] }, "id": "e03jjenu23uc********", "createdAt": "2024-08-22T11:39:22Z", "createdBy": "aje3bg430agh********", "modifiedAt": "2024-08-22T11:39:23Z" }