How to synthesize speech in the SpeechKit API v1
Written by
Updated at October 24, 2024
Speech synthesis converts text to speech and saves it to an audio file. In this section, you will learn how to synthesize speech from text using the SpeechKit API v1 (REST).
Send a request to convert text to speech:
read -r -d '' TEXT << EOM
I'm Yandex Speech+Kit.
I can turn any text into speech.
Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl
--request POST \
--header "Authorization: Bearer ${IAM_TOKEN}" \
--data-urlencode "text=${TEXT}" \
--data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
"https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
Where:
FOLDER_ID
: Folder ID you got before you started.IAM_TOKEN
: IAM token you got before you started.TEXT
: Text to be recognized with URL encoding applied.lang
: Text language.voice
: Voice for speech synthesis.speech.ogg
: Output file.
Note
For homographs, use +
before the stressed vowel: +import
, im+port
. For a pause between words, put -
. Maximum string length: 5,000 characters.
The synthesized speech will be written to the speech.ogg
file in the folder you ran this command from.
By default, the audio will be in OggOpus
For more information, see the description of request format for speech synthesis.