Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • How to recognize short audio files in the API v1
    • How to recognize long audio files in the API v3 and v2
    • How to synthesize speech in the API v1
    • How to synthesize speech in the API v3
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Authentication for API access
  • Getting started
  • Convert text to an audio file
  1. Getting started
  2. How to synthesize speech in the API v3

How to synthesize speech in the SpeechKit API v3

Written by
Yandex Cloud
Updated at March 31, 2025
  • Authentication for API access
  • Getting started
  • Convert text to an audio file

In this section, you will learn how to synthesize speech from text using the SpeechKit API v3 (gRPC).

Authentication for API accessAuthentication for API access

To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:

Yandex or federated account
Service account
  1. Get an IAM token for your Yandex account or federated account.
  2. Get the ID of the folder for which your account has the ai.speechkit-stt.user, ai.speechkit-tts.user, or higher roles.
  3. When accessing SpeechKit via the API, provide the received parameters in each request:

    • For API v1 and API v2:

      Specify the IAM token in the Authorization header in the following format:

      Authorization: Bearer <IAM token>
      

      Specify the folder ID in the request body in the folderId parameter.

    • For API v3:

      • Specify the IAM token in the Authorization header.
      • Specify the folder ID in the x-folder-id header.
      Authorization: Bearer <IAM_token>
      x-folder-id <folder_ID>
      

SpeechKit supports two authentication methods based on service accounts:

  • With an IAM token:

    1. Get an IAM token.

    2. Provide the IAM token in the Authorization header in the following format:

      Authorization: Bearer <IAM_token>
      
  • With API keys.

    Use API keys if requesting an IAM token automatically is not an option.

    1. Get an API key.

    2. Provide the API key in the Authorization header in the following format:

      Authorization: Api-Key <API_key>
      

Do not specify the folder ID in your requests, as the service uses the folder the service account was created in.

In the example below, a Yandex account is used for authentication.

Getting startedGetting started

  1. Install the grpcurl utility.

  2. Install the jq utility for piped processing of JSON files.

    sudo apt update && sudo apt install jq
    

Note

You can implement speech synthesis in the SpeechKit API v3 either using the mentioned utilities or other methods.

Convert text to an audio fileConvert text to an audio file

To synthesize speech from text in TTS markup to a WAV file:

  1. Create a file with the body of an API request and text to synthesize to speech:

    tts_req.json
    {
     "text": "I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!",
     "outputAudioSpec": {
       "containerAudio": {
         "containerAudioType": "WAV"
       }
     },
     "hints": [
         {
             "voice": "jane"
         },
         {
             "role": "good"
         }
     ],
     "loudnessNormalizationType": "LUFS"
    }
    
  2. Run the following commands:

    export FOLDER_ID=<folder_ID>
    export IAM_TOKEN=<IAM_token>
    jq . -c tts_req.json | \
    grpcurl -H "authorization: Bearer ${IAM_TOKEN}" \
            -H "x-folder-id: ${FOLDER_ID}" \
            -d @ tts.api.cloud.yandex.net:443 speechkit.tts.v3.Synthesizer/UtteranceSynthesis | \
    jq -r '.audioChunk.data' | base64 -d > speech.wav
    

    Where:

    • FOLDER_ID: Folder ID you got earlier.

      If you are using an IAM token of a service account, do not specify the folder ID in your request, as the service uses the folder the service account was created in.

    • IAM_TOKEN: IAM token you got earlier.

    • speech.wav: Output file.

As a result, a synthesized speech file named speech.wav will be created in the folder.

See alsoSee also

  • Learn more about the API v3
  • Authentication with the API
  • Speech synthesis in the API v3

Was the article helpful?

Previous
How to synthesize speech in the API v1
Next
Speech recognition using Playground
© 2025 Direct Cursus Technology L.L.C.