How to synthesize speech in the SpeechKit API v3
Written by
Updated at April 17, 2024
In this section, you will learn how to synthesize speech from text using the SpeechKit API v3 (gRPC).
You will need the grpcurl
utility to use the API.
Getting started
-
Install the grpcurl utility
. -
Install the jq utility
for piped processing of JSON files.sudo apt update && sudo apt install jq
Note
You can implement speech synthesis in the SpeechKit API v3 either using the mentioned utilities or other methods.
Convert text to an audio file
To synthesize speech from text with TTS markup to a WAV
-
Create a file with the body of an API request and text to synthesize to speech:
tts_req.json
{ "text": "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!", "outputAudioSpec": { "containerAudio": { "containerAudioType": "WAV" } }, "hints": [ { "voice": "jane" }, { "role": "good" } ], "loudnessNormalizationType": "LUFS" }
-
Run the following commands:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> jq . -c tts_req.json | \ grpcurl -H "authorization: Bearer ${IAM_TOKEN}" \ -H "x-folder-id: ${FOLDER_ID}" \ -d @ tts.api.cloud.yandex.net:443 speechkit.tts.v3.Synthesizer/UtteranceSynthesis | \ jq -r '.audioChunk.data' | base64 -d > speech.wav
Where:
FOLDER_ID
: Folder ID received before starting. If you are using the service account's IAM token, do not specify the folder ID in your request: the service uses the folder where the service account was created.IAM_TOKEN
: IAM token received before starting.speech.wav
: File where the response will be written.
As a result, the speech.wav
file with synthesized speech will be created in the directory.