Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Speech synthesis in the API v3
      • Speech synthesis in REST API v3
      • Pattern-based speech synthesis
      • Brand Voice Call Center pattern-based speech synthesis
      • Speech synthesis in WAV format, API v1
      • Speech synthesis in OggOpus format, API v1
      • Speech synthesis from SSML text, API v1
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  1. Step-by-step guides
  2. Speech synthesis
  3. Speech synthesis in WAV format, API v1

Speech synthesis in WAV format using the API v1

Written by
Yandex Cloud
Updated at February 10, 2025

The example shows how you can use the API v1 to synthesize speech from text in TTS markup to a WAV file.

The example uses the following synthesis parameters:

  • Synthesized audio file format: LPCM with a sample rate of 48,000 Hz, WAV container.
  • Language: Russian
  • Voice: filipp

Conversion and recording the result in WAV are performed using the SoX utility.

The Yandex account or federated account are authenticated using an IAM token. If using a service account, you do not need to include the folder ID in the request. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.

  1. Synthesize a file in LPCM format:

    cURL
    C#
    Python 3
    PHP

    Submit a text-to-speech conversion request:

    read -r -d '' TEXT << EOM
    > I'm Yandex Speech+Kit.
    > I can turn any text into speech.
    > Now y+ou can, too!
    EOM
    export FOLDER_ID=<folder_ID>
    export IAM_TOKEN=<IAM_token>
    curl \
      --request POST \
      --header "Authorization: Bearer ${IAM_TOKEN}" \
      --output speech.raw \
      --data-urlencode "text=${TEXT}" \
      --data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}&format=lpcm&sampleRateHertz=48000" \
      https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize
    

    Where:

    • TEXT: Text for synthesis in TTS markup.
    • FOLDER_ID: Folder ID.
    • IAM_TOKEN: IAM token.
    • lang: Text language.
    • voice: Voice for speech synthesis.
    • format: Synthesized audio file format.
    • sampleRateHertz: Sample rate of the LPCM audio file.

    Submit a text-to-speech conversion request:

    using System;
    using System.Collections.Generic;
    using System.Net.Http;
    using System.Threading.Tasks;
    using System.IO;
    
    namespace TTS
    {
      class Program
      {
        static void Main()
        {
          Tts().GetAwaiter().GetResult();
        }
    
        static async Task Tts()
        {
          const string iamToken = "<IAM_token>";
          const string folderId = "<folder_ID>";
    
          HttpClient client = new HttpClient();
          client.DefaultRequestHeaders.Add("Authorization", "Bearer " + iamToken);
          var values = new Dictionary<string, string>
          {
            { "text", "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!" },
            { "lang", "ru-RU" },
            { "voice", "filipp" },
            { "folderId", folderId },
            { "format", "lpcm" },
            { "sampleRateHertz", "48000" }
          };
          var content = new FormUrlEncodedContent(values);
          var response = await client.PostAsync("https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize", content);
          var responseBytes = await response.Content.ReadAsByteArrayAsync();
          File.WriteAllBytes("speech.raw", responseBytes);
        }
      }
    }
    

    Where:

    • iamToken: IAM token.
    • folderId: Folder ID.
    • text: Text for synthesis in TTS markup.
    • lang: Text language.
    • voice: Voice for speech synthesis.
    • format: Synthesized audio file format.
    • sampleRateHertz: Sample rate of the LPCM audio file.
    • Create a file (e.g., test.py), and add the following code to it:

      import argparse
      import requests
      
      def synthesize(folder_id, iam_token, text):
          url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
          headers = {
              'Authorization': 'Bearer ' + iam_token,
          }
      
          data = {
              'text': text,
              'lang': 'ru-RU',
              'voice': 'filipp',
              'folderId': folder_id,
              'format': 'lpcm',
              'sampleRateHertz': 48000,
          }
      
          with requests.post(url, headers=headers, data=data, stream=True) as resp:
              if resp.status_code != 200:
                  raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))
      
              for chunk in resp.iter_content(chunk_size=None):
                  yield chunk
      
      
      if __name__ == "__main__":
          parser = argparse.ArgumentParser()
          parser.add_argument("--token", required=True, help="IAM token")
          parser.add_argument("--folder_id", required=True, help="Folder id")
          parser.add_argument("--text", required=True, help="Text for synthesize")
          parser.add_argument("--output", required=True, help="Output file name")
          args = parser.parse_args()
      
          with open(args.output, "wb") as f:
              for audio_content in synthesize(args.folder_id, args.token, args.text):
                  f.write(audio_content)
      

      Where:

      • lang: Text language.
      • voice: Voice for speech synthesis.
      • format: Synthesized audio file format.
      • sampleRateHertz: Sample rate of the LPCM audio file.
    • Run the created file:

      export FOLDER_ID=<folder_ID>
      export IAM_TOKEN=<IAM_token>
      python3 test.py
        --token ${IAM_TOKEN}
        --folder_id ${FOLDER_ID}
        --output speech.raw
        --text "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!"
      

      Where:

      • FOLDER_ID: Folder ID.
      • IAM_TOKEN: IAM token.
      • --output: Name of the file for the audio.
      • --text: Text for synthesis in TTS markup.

    Submit a text-to-speech conversion request:

    <?php
    
    $token = '<IAM_token>'; # Specify an IAM token.
    $folderId = "<folder_ID>"; #: Specify a folder ID.
    
    $url = "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize";
    $headers = ['Authorization: Bearer ' . $token];
    $post = array(
        'text' => "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!",
        'folderId' => $folderId,
        'lang' => 'ru-RU',
        'voice' => 'filipp',
        'format' => 'lpcm',
        'sampleRateHertz' => '48000');
    
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, false);
    if ($post !== false) {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
    }
    curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    
    $response = curl_exec($ch);
    if (curl_errno($ch)) {
        print "Error: " . curl_error($ch);
    }
    if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
        $decodedResponse = json_decode($response, true);
        echo "Error code: " . $decodedResponse["error_code"] . "\r\n";
        echo "Error message: " . $decodedResponse["error_message"] . "\r\n";
    } else {
        file_put_contents("speech.raw", $response);
    }
    curl_close($ch);
    

    Where:

    • token: IAM token.
    • folderId: Folder ID.
    • text: Text for synthesis in TTS markup.
    • lang: Text language.
    • voice: Voice for speech synthesis.
    • format: Synthesized audio file format.
    • sampleRateHertz: Sample rate of the LPCM audio file.
  2. Convert the resulting file to WAV format using the SoX utility.

    sox -r 48000 -b 16 -e signed-integer -c 1 speech.raw speech.wav
    

See alsoSee also

  • API v1 method description
  • Speech synthesis in OggOpus format using the API v1
  • Speech synthesis from SSML text using API v1
  • Authentication with the SpeechKit API

Was the article helpful?

Previous
Brand Voice Call Center pattern-based speech synthesis
Next
Speech synthesis in OggOpus format, API v1
© 2025 Direct Cursus Technology L.L.C.