Asynchronous recognition of LPCM audio files using the API v2
Below is an example of asynchronous recognition of speech from an audio file using the SpeechKit API v2. This example uses the following parameters:
- Language: Russian.
- Language model:
general
. - Format of the submitted audio: LPCM with a sampling rate of 8000 Hz.
- Number of audio channels: 1 (default).
- Other parameters left by default.
You can generate and send a speech recognition request using cURL
An IAM token is used to authenticate the service account. Learn more about authentication in the SpeechKit API.
Getting started
-
Create a service account.
Warning
Please note that you can only recognize audio files asynchronously under a service account. Do not use any other accounts in Yandex Cloud for that.
-
Assign the service account the
storage.uploader
andai.speechkit-stt.user
roles for the folder where you created the bucket. -
Get an IAM token or API key for the created service account.
If you do not have an LPCM audio file, you can download a sample file
Perform speech recognition via the API
-
Get a link to an audio file in Object Storage.
-
Create a file named
body.json
and add the following code to it:{ "config": { "specification": { "languageCode": "ru-RU", "model": "general", "audioEncoding": "LINEAR16_PCM", "sampleRateHertz": 8000, "audioChannelCount": 1 } }, "audio": { "uri": "<link_to_audio_file>" } }
Where:
-
languageCode
: Recognition language. -
model
: Speech recognition model. -
audioEncoding
: Format of the submitted audio file. -
sampleRateHertz
: Audio file sampling rate in Hz. -
audioChannelCount
: Number of audio channels. -
uri
: Link to the audio file in Object Storage. Here is an example of such a link:https://storage.yandexcloud.net/speechkit/speech.pcm
.The link contains additional query parameters (after
?
) for buckets with restricted access. You do not need to provide these parameters in SpeechKit as they are ignored.
-
-
Run the created file:
export API_KEY=<service_account_API_key> && \ curl \ --insecure \ --header "Authorization: Api-Key ${API_KEY}" \ --data "@body.json"\ https://transcribe.api.cloud.yandex.net/speech/stt/v2/longRunningRecognize
Result example:
{ "done": false, "id": "e03sup6d5h1q********", "createdAt": "2019-04-21T22:49:29Z", "createdBy": "ajes08feato8********", "modifiedAt": "2019-04-21T22:49:29Z" }
Save the recognition operation
id
you get in the response. -
Wait until the recognition is completed. It takes about 10 seconds to recognize one minute of single-channel audio.
-
Send a request to get information about the operation:
curl \ --insecure \ --header "Authorization: Api-key ${API_KEY}" \ https://operation.api.cloud.yandex.net/operations/<recognition_operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.stt.v2.LongRunningRecognitionResponse", "chunks": [ { "alternatives": [ { "words": [ { "startTime": "0.160s", "endTime": "0.500s", "word": "hello", "confidence": 1 }, { "startTime": "0.580s", "endTime": "0.800s", "word": "world", "confidence": 1 } ], "text": "Hello world", "confidence": 1 } ], "channelTag": "1" } ] }, "id": "e03jjenu23uc********", "createdAt": "2024-08-22T11:39:22Z", "createdBy": "aje3bg430agh********", "modifiedAt": "2024-08-22T11:39:23Z" }