How to synthesize speech in the SpeechKit API v3

Written by

Yandex Cloud

Updated at March 31, 2025

Authentication for API access
Getting started
Convert text to an audio file

In this section, you will learn how to synthesize speech from text using the SpeechKit API v3 (gRPC).

Authentication for API access

To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:

Yandex or federated account

Service account

Get an IAM token for your Yandex account or federated account.
Get the ID of the folder for which your account has the ai.speechkit-stt.user, ai.speechkit-tts.user, or higher roles.
When accessing SpeechKit via the API, provide the received parameters in each request:
- For API v1 and API v2:
  
  Specify the IAM token in the Authorization header in the following format:
```
Authorization: Bearer <IAM token>
```
  Specify the folder ID in the request body in the folderId parameter.
- For API v3:
  - Specify the IAM token in the Authorization header.
  - Specify the folder ID in the x-folder-id header.
```
Authorization: Bearer <IAM_token>
x-folder-id <folder_ID>
```

SpeechKit supports two authentication methods based on service accounts:

With an IAM token:
1. Get an IAM token.
2. Provide the IAM token in the Authorization header in the following format:
```
Authorization: Bearer <IAM_token>
```
With API keys.

Use API keys if requesting an IAM token automatically is not an option.
1. Get an API key.
2. Provide the API key in the Authorization header in the following format:
```
Authorization: Api-Key <API_key>
```

Do not specify the folder ID in your requests, as the service uses the folder the service account was created in.

In the example below, a Yandex account is used for authentication.

Getting started

Install the grpcurl utility.
Install the jq utility for piped processing of JSON files.
```
sudo apt update && sudo apt install jq
```

Note

You can implement speech synthesis in the SpeechKit API v3 either using the mentioned utilities or other methods.

Convert text to an audio file

To synthesize speech from text in TTS markup to a WAV file:

Create a file with the body of an API request and text to synthesize to speech:

tts_req.json

{
 "text": "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!",
 "outputAudioSpec": {
   "containerAudio": {
     "containerAudioType": "WAV"
   }
 },
 "hints": [
     {
         "voice": "jane"
     },
     {
         "role": "good"
     }
 ],
 "loudnessNormalizationType": "LUFS"
}

Run the following commands:

export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
jq . -c tts_req.json | \
grpcurl -H "authorization: Bearer ${IAM_TOKEN}" \
        -H "x-folder-id: ${FOLDER_ID}" \
        -d @ tts.api.cloud.yandex.net:443 speechkit.tts.v3.Synthesizer/UtteranceSynthesis | \
jq -r '.audioChunk.data' | base64 -d > speech.wav

Where:

FOLDER_ID: Folder ID you got earlier.

If you are using an IAM token of a service account, do not specify the folder ID in your request, as the service uses the folder the service account was created in.
IAM_TOKEN: IAM token you got earlier.
speech.wav: Output file.

As a result, a synthesized speech file named speech.wav will be created in the folder.

How to synthesize speech in the SpeechKit API v3

Authentication for API access

Getting started

Convert text to an audio file

See also

Was the article helpful?

How to synthesize speech in the SpeechKit API v3

Authentication for API accessAuthentication for API access

Getting startedGetting started

Convert text to an audio fileConvert text to an audio file

See alsoSee also

Was the article helpful?

Authentication for API access

Getting started

Convert text to an audio file

See also