Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • How to recognize short audio files in the API v1
    • How to recognize long audio files in the API v3 and v2
    • How to synthesize speech in the API v1
    • How to synthesize speech in the API v3
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Authentication for API access
  • Execute a request
  1. Getting started
  2. How to synthesize speech in the API v1

How to synthesize speech in the SpeechKit API v1

Written by
Yandex Cloud
Updated at March 28, 2025
  • Authentication for API access
  • Execute a request

Speech synthesis converts text to speech and saves it to an audio file. In this section, you will learn how to synthesize speech from text using the SpeechKit API v1 (REST).

In the example, the API is used via the cURL utility.

Authentication for API accessAuthentication for API access

To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:

Yandex or federated account
Service account
  1. Get an IAM token for your Yandex account or federated account.
  2. Get the ID of the folder for which your account has the ai.speechkit-stt.user, ai.speechkit-tts.user, or higher roles.
  3. When accessing SpeechKit via the API, provide the received parameters in each request:

    • For API v1 and API v2:

      Specify the IAM token in the Authorization header in the following format:

      Authorization: Bearer <IAM token>
      

      Specify the folder ID in the request body in the folderId parameter.

    • For API v3:

      • Specify the IAM token in the Authorization header.
      • Specify the folder ID in the x-folder-id header.
      Authorization: Bearer <IAM_token>
      x-folder-id <folder_ID>
      

SpeechKit supports two authentication methods based on service accounts:

  • With an IAM token:

    1. Get an IAM token.

    2. Provide the IAM token in the Authorization header in the following format:

      Authorization: Bearer <IAM_token>
      
  • With API keys.

    Use API keys if requesting an IAM token automatically is not an option.

    1. Get an API key.

    2. Provide the API key in the Authorization header in the following format:

      Authorization: Api-Key <API_key>
      

Do not specify the folder ID in your requests, as the service uses the folder the service account was created in.

In the example below, authentication is performed under a Yandex account.

Execute a requestExecute a request

Submit a text-to-speech conversion request:

read -r -d '' TEXT << EOM
I'm Yandex Speech+Kit.
I can turn any text into speech.
Now y+ou can, too!
EOM
export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl \
  --request POST \
  --header "Authorization: Bearer ${IAM_TOKEN}" \
  --data-urlencode "text=${TEXT}" \
  --data "lang=ru-RU&voice=filipp&folderId=${FOLDER_ID}" \
  "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg

Where:

  • FOLDER_ID: Folder ID you got earlier.
  • IAM_TOKEN: IAM token you got earlier.
  • TEXT: Text to be recognized with URL encoding applied.
  • lang: Text language.
  • voice: Voice for speech synthesis.
  • speech.ogg: Output file.

Note

For homographs, use + before the stressed vowel: +import, im+port. For a pause between words, put -. Maximum string length: 5,000 characters.

The synthesized speech will be written to the speech.ogg file in the folder you ran this command from.

By default, the audio will be in OggOpus format. You can listen to the output file in your browser, e.g., Yandex Browser or Mozilla Firefox.

For more information, see the description of request format for speech synthesis.

TutorialsTutorials

  • Speech synthesis in OggOpus format using the API v1
  • Speech synthesis from SSML text using API v1
  • Speech synthesis in WAV format using the API v1

Was the article helpful?

Previous
How to recognize long audio files in the API v3 and v2
Next
How to synthesize speech in the API v3
© 2025 Direct Cursus Technology L.L.C.