How to synthesize speech in the SpeechKit API v1
Speech synthesis converts text to speech and saves it to an audio file. In this section, you will learn how to synthesize speech from text using the SpeechKit API v1 (REST).
In the example, the API is used via the cURL
Authentication for API access
To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:
- Get an IAM token for your Yandex account or federated account.
- Get the ID of the folder for which your account has the
ai.speechkit-stt.user
,ai.speechkit-tts.user
, or higher roles. -
When accessing SpeechKit via the API, provide the received parameters in each request:
-
For API v1 and API v2:
Specify the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM token>
Specify the folder ID in the request body in the
folderId
parameter. -
For API v3:
- Specify the IAM token in the
Authorization
header. - Specify the folder ID in the
x-folder-id
header.
Authorization: Bearer <IAM_token> x-folder-id <folder_ID>
- Specify the IAM token in the
-
SpeechKit supports two authentication methods based on service accounts:
-
With an IAM token:
-
Get an IAM token.
-
Provide the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM_token>
-
-
With API keys.
Use API keys if requesting an IAM token automatically is not an option.
-
Provide the API key in the
Authorization
header in the following format:Authorization: Api-Key <API_key>
Do not specify the folder ID in your requests, as YandexGPT uses the folder in which the service account was created.
In the example below, authentication is performed under a Yandex account.
Execute a request
Submit a text-to-speech conversion request:
read -r -d '' TEXT << EOM
I'm Yandex Speech+Kit.
I can turn any text into speech.
Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl
--request POST \
--header "Authorization: Bearer ${IAM_TOKEN}" \
--data-urlencode "text=${TEXT}" \
--data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
"https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
Where:
FOLDER_ID
: Folder ID you got earlier.IAM_TOKEN
: IAM token you got earlier.TEXT
: Text to be recognized with URL encoding applied.lang
: Text language.voice
: Voice for speech synthesis.speech.ogg
: Output file.
Note
For homographs, use +
before the stressed vowel: +import
, im+port
. For a pause between words, put -
. Maximum string length: 5,000 characters.
The synthesized speech will be written to the speech.ogg
file in the folder you ran this command from.
By default, the audio will be in OggOpus
For more information, see the description of request format for speech synthesis.