Speech synthesis in OggOpus format using the API v1

Written by

Yandex Cloud

Improved by

Alexey

Updated at October 24, 2024

With the API v1, you can synthesize speech from text in TTS markup to an OggOpus file.

The example uses the following synthesis parameters:

Language: Russian.
Voice: filipp.
Other parameters are left at their defaults.

The Yandex account or federated account are authenticated using an IAM token. If using a service account, you do not need to include the folder ID in the request. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.

cURL

Python 3

PHP

Node.js

Submit a text-to-speech conversion request:

read -r -d '' TEXT << EOM
> I'm Yandex Speech+Kit.
> I can turn any text into speech.
> Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl \
  --request POST \
  --header "Authorization: Bearer ${IAM_TOKEN}" \
  --data-urlencode "text=${TEXT}" \
  --data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
  "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

Where:

TEXT: Text for synthesis in TTS markup.
FOLDER_ID: Folder ID.
IAM_TOKEN: IAM token.
lang: Text language.
voice: Voice for speech synthesis.

The synthesized speech will be written to the speech.ogg file in the folder you sent your request from.

Submit a text-to-speech conversion request:

using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Threading.Tasks;
using System.IO;

namespace TTS
{
  class Program
  {
    static void Main()
    {
      Tts().GetAwaiter().GetResult();
    }

    static async Task Tts()
    {
      const string iamToken = "<IAM_token>";
      const string folderId = "<folder_ID>";

      HttpClient client = new HttpClient();
      client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken);
      var values = new Dictionary<string, string>
      {
        { "text", "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!" },
        { "lang", "ru-RU" },
        { "voice", "filipp" },
        { "folderId", folderId }
      };
      var content = new FormUrlEncodedContent(values);
      var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content);
      var responseBytes = await response.Content.ReadAsByteArrayAsync();
      File.WriteAllBytes("speech.ogg", responseBytes);
    }
  }
}

Where:

iamToken: IAM token.
folderId: Folder ID.
text: Text for synthesis in TTS markup.
lang: Text language.
voice: Voice for speech synthesis.

The synthesized speech will be written to the speech.ogg file in the folder you sent your request from.

Create a file (e.g., test.py), and add the following code to it:

import argparse
import requests

def synthesize(folder_id, iam_token, text):
   url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
   headers = {
       'Authorization': 'Bearer ' + iam_token,
   }

   data = {
       'text': text,
       'lang': 'ru-RU',
       'voice': 'filipp',
       'folderId': folder_id
   }

   with requests.post(url, headers=headers, data=data, stream=True) as resp:
       if resp.status_code != 200:
           raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))

       for chunk in resp.iter_content(chunk_size=None):
           yield chunk

if __name__ == "__main__":
   parser = argparse.ArgumentParser()
   parser.add_argument("--token", required=True, help="IAM token")
   parser.add_argument("--folder_id", required=True, help="Folder id")
   parser.add_argument("--text", required=True, help="Text for synthesize")
   parser.add_argument("--output", required=True, help="Output file name")
   args = parser.parse_args()

   with open(args.output, "wb") as f:
       for audio_content in synthesize(args.folder_id, args.token, args.text):
           f.write(audio_content)

Where:

lang: Text language.
voice: Voice for speech synthesis.

Run the created file:

export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
python3 test.py
  --token ${IAM_TOKEN} \
  --folder_id ${FOLDER_ID} \
  --output speech.ogg \
  --text "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!"

Where:

FOLDER_ID: Folder ID.
IAM_TOKEN: IAM token.
--output: Name of the file for the audio.
--text: Text for synthesis in TTS markup.

The synthesized speech will be written to the speech.ogg file in the folder you ran your file from.

Submit a text-to-speech conversion request:

<?php

$token = '<IAM_token>'; # Specify the IAM token.
$folderId = "<folder_ID>"; # Specify the folder ID.

$url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize";
$headers = ['Authorization: Bearer ' . $token];
$post = array(
    'text' => "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!",
    'folderId' => $folderId,
    'lang' => 'ru-RU',
    'voice' => 'filipp');

$ch = curl_init();

curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HEADER, false);
if ($post !== false) {
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
}
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$response = curl_exec($ch);
if (curl_errno($ch)) {
    print "Error: " . curl_error($ch);
}
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
    $decodedResponse = json_decode($response, true);
    echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
    echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
} else {
    file_put_contents("speech.ogg", $response);
}
curl_close($ch);

Where:

token: IAM token.
folderId: Folder ID.
text: Text for synthesis in TTS markup.
lang: Text language.
voice: Voice for speech synthesis.

The synthesized speech will be written to the speech.ogg file in the folder you sent your request from.

Set the required dependencies:

npm install --save axios form-data

Submit a text-to-speech conversion request:

import FormData from 'form-data';
import axios from 'axios';
import fs from 'node:fs';

const IAM_TOKEN = '<IAM_TOKEN>';
const FOLDER_ID = '<FOLDER_ID>';

const formData = new FormData();

formData.append('voice', 'filipp');
formData.append('text', 'I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!');
formData.append('lang', 'ru-RU');
formData.append('folderId', FOLDER_ID);

const headers = {
  Authorization: `Bearer ${IAM_TOKEN}`,
  ...formData.getHeaders(),
};

axios
  .post('https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize', formData, {
    headers,
    responseType: 'arraybuffer'
  })
  .then(response => fs.writeFileSync('speech.ogg', response.data));

Where:

IAM_TOKEN: IAM token.
FOLDER_ID: Folder ID.
text: Text for synthesis in TTS markup.
lang: Text language.
voice: Voice for speech synthesis.

The synthesized speech will be written to the speech.ogg file in the folder you sent your request from.

Speech synthesis in OggOpus format using the API v1

See also

Was the article helpful?

Speech synthesis in OggOpus format using the API v1

See alsoSee also

Was the article helpful?

See also