How to synthesize speech in the SpeechKit API v3
In this section, you will learn how to synthesize speech from text using the SpeechKit API v3 (gRPC).
Authentication for API access
To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:
- Get an IAM token for your Yandex account or federated account.
- Get the ID of the folder for which your account has the
ai.speechkit-stt.user
,ai.speechkit-tts.user
, or higher roles. -
When accessing SpeechKit via the API, provide the received parameters in each request:
-
For API v1 and API v2:
Specify the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM token>
Specify the folder ID in the request body in the
folderId
parameter. -
For API v3:
- Specify the IAM token in the
Authorization
header. - Specify the folder ID in the
x-folder-id
header.
Authorization: Bearer <IAM_token> x-folder-id <folder_ID>
- Specify the IAM token in the
-
SpeechKit supports two authentication methods based on service accounts:
-
With an IAM token:
-
Get an IAM token.
-
Provide the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM_token>
-
-
With API keys.
Use API keys if requesting an IAM token automatically is not an option.
-
Provide the API key in the
Authorization
header in the following format:Authorization: Api-Key <API_key>
Do not specify the folder ID in your requests, as YandexGPT uses the folder in which the service account was created.
In the example below, authentication is performed under a Yandex account.
Getting started
-
Install the grpcurl utility
. -
Install the jq utility
for piped processing of JSON files.sudo apt update && sudo apt install jq
Note
You can implement speech synthesis in the SpeechKit API v3 either using the mentioned utilities or other methods.
Convert text to an audio file
To synthesize speech from text in TTS markup to a WAV
-
Create a file with the body of an API request and text to synthesize to speech:
tts_req.json
{ "text": "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!", "outputAudioSpec": { "containerAudio": { "containerAudioType": "WAV" } }, "hints": [ { "voice": "jane" }, { "role": "good" } ], "loudnessNormalizationType": "LUFS" }
-
Run the following commands:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> jq . -c tts_req.json | \ grpcurl -H "authorization: Bearer ${IAM_TOKEN}" \ -H "x-folder-id: ${FOLDER_ID}" \ -d @ tts.api.cloud.yandex.net:443 speechkit.tts.v3.Synthesizer/UtteranceSynthesis | \ jq -r '.audioChunk.data' | base64 -d > speech.wav
Where:
As a result, a synthesized speech file named speech.wav
will be created in the folder.