Speech synthesis in WAV format using the API v1
The example shows how you can use the API v1 to synthesize speech from text in TTS markup to a WAV file
The example uses the following synthesis parameters:
- Synthesized audio file format: LPCM with a sample rate of 48,000 Hz, WAV
container. - Language: Russian
- Voice:
filipp
Conversion and recording the result in WAV are performed using the SoX
The Yandex account or federated account are authenticated using an IAM token. If you use your service account, you do not need to include the folder ID in the request. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.
-
Synthesize a file in LPCM format:
cURLC#Python 3PHPSubmit a text-to-speech conversion request:
read -r -d '' TEXT << EOM > I'm Yandex Speech+Kit. > I can turn any text into speech. > Now y+ou can, too! EOM export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --output speech.raw \ --data-urlencode "text=${TEXT}" \ --data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}&format=lpcm&sampleRateHertz=48000" \ https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize
Where:
Submit a text-to-speech conversion request:
using System; using System.Collections.Generic; using System.Net.Http; using System.Threading.Tasks; using System.IO; namespace TTS { class Program { static void Main() { Tts().GetAwaiter().GetResult(); } static async Task Tts() { const string iamToken = "<IAM_token>"; const string folderId = "<folder_ID>"; HttpClient client = new HttpClient(); client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken); var values = new Dictionary<string, string> { { "text", "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!" }, { "lang", "ru-RU" }, { "voice", "filipp" }, { "folderId", folderId }, { "format", "lpcm" }, { "sampleRateHertz", "48000" } }; var content = new FormUrlEncodedContent(values); var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content); var responseBytes = await response.Content.ReadAsByteArrayAsync(); File.WriteAllBytes("speech.raw", responseBytes); } } }
Where:
-
Create a file (e.g.,
test.py
), and add the following code to it:import argparse import requests def synthesize(folder_id, iam_token, text): url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize' headers = { 'Authorization': 'Bearer ' + iam_token, } data = { 'text': text, 'lang': 'ru-RU', 'voice': 'filipp', 'folderId': folder_id, 'format': 'lpcm', 'sampleRateHertz': 48000, } with requests.post(url, headers=headers, data=data, stream=True) as resp: if resp.status_code != 200: raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text)) for chunk in resp.iter_content(chunk_size=None): yield chunk if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--token", required=True, help="IAM token") parser.add_argument("--folder_id", required=True, help="Folder id") parser.add_argument("--text", required=True, help="Text for synthesize") parser.add_argument("--output", required=True, help="Output file name") args = parser.parse_args() with open(args.output, "wb") as f: for audio_content in synthesize(args.folder_id, args.token, args.text): f.write(audio_content)
Where:
-
Run the created file:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> python3 test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --output speech.raw --text "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!"
Where:
FOLDER_ID
: Folder ID.IAM_TOKEN
: IAM token.--output
: Name of the file for the audio.--text
: Text for synthesis in TTS markup.
Submit a text-to-speech conversion request:
<?php $token = '<IAM_token>'; # Specify the IAM token. $folderId = "<folder_ID>"; # Specify the folder ID. $url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize"; $headers = ['Authorization: Bearer ' . $token]; $post = array( 'text' => "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!", 'folderId' => $folderId, 'lang' => 'ru-RU', 'voice' => 'filipp', 'format' => 'lpcm', 'sampleRateHertz' => '48000'); $ch = curl_init(); curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_HEADER, false); if ($post !== false) { curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post); } curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); $response = curl_exec($ch); if (curl_errno($ch)) { print "Error: " . curl_error($ch); } if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) { $decodedResponse = json_decode($response, true); echo "Error code: " . $decodedResponse["error_code"] . "\r\n"; echo "Error message: " . $decodedResponse["error_message"] . "\r\n"; } else { file_put_contents("speech.raw", $response); } curl_close($ch);
Where:
-
-
Convert the resulting file to WAV format using the SoX
utility.sox -r 48000 -b 16 -e signed-integer -c 1 speech.raw speech.wav