How to synthesize speech in the SpeechKit API v1

Written by

Updated at March 28, 2025

Authentication for API access
Execute a request

Speech synthesis converts text to speech and saves it to an audio file. In this section, you will learn how to synthesize speech from text using the SpeechKit API v1 (REST).

In the example, the API is used via the cURL utility.

Authentication for API access

To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:

Yandex or federated account

Service account

Get an IAM token for your Yandex account or federated account.
Get the ID of the folder for which your account has the ai.speechkit-stt.user, ai.speechkit-tts.user, or higher roles.
When accessing SpeechKit via the API, provide the received parameters in each request:
- For API v1 and API v2:
  
  Specify the IAM token in the Authorization header in the following format:
```
Authorization: Bearer <IAM token>
```
  Specify the folder ID in the request body in the folderId parameter.
- For API v3:
  - Specify the IAM token in the Authorization header.
  - Specify the folder ID in the x-folder-id header.
```
Authorization: Bearer <IAM_token>
x-folder-id <folder_ID>
```

SpeechKit supports two authentication methods based on service accounts:

With an IAM token:
1. Get an IAM token.
2. Provide the IAM token in the Authorization header in the following format:
```
Authorization: Bearer <IAM_token>
```
With API keys.

Use API keys if requesting an IAM token automatically is not an option.
1. Get an API key.
2. Provide the API key in the Authorization header in the following format:
```
Authorization: Api-Key <API_key>
```

Do not specify the folder ID in your requests, as the service uses the folder the service account was created in.

In the example below, authentication is performed under a Yandex account.

Execute a request

Submit a text-to-speech conversion request:

read -r -d '' TEXT << EOM
I'm Yandex Speech+Kit.
I can turn any text into speech.
Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl \
  --request POST \
  --header "Authorization: Bearer ${IAM_TOKEN}" \
  --data-urlencode "text=${TEXT}" \
  --data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
  "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

Where:

FOLDER_ID: Folder ID you got earlier.
IAM_TOKEN: IAM token you got earlier.
TEXT: Text to be recognized with URL encoding applied.
lang: Text language.
voice: Voice for speech synthesis.
speech.ogg: Output file.

Note

For homographs, use + before the stressed vowel: +import, im+port. For a pause between words, put -. Maximum string length: 5,000 characters.

The synthesized speech will be written to the speech.ogg file in the folder you ran this command from.

By default, the audio will be in OggOpus format. You can listen to the output file in your browser, e.g., Yandex Browser or Mozilla Firefox.

For more information, see the description of request format for speech synthesis.

How to synthesize speech in the SpeechKit API v1

Authentication for API access

Execute a request

Tutorials

Was the article helpful?

How to synthesize speech in the SpeechKit API v1

Authentication for API accessAuthentication for API access

Execute a requestExecute a request

TutorialsTutorials

Was the article helpful?

Authentication for API access

Execute a request

Tutorials