Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • API authentication
      • API v1
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Parameters in the request body
  • Response
  • Use cases
  1. API references
  2. Speech synthesis
  3. API v1

API v1 method description

Written by
Yandex Cloud
Improved by
amatol
Updated at February 10, 2025
  • Parameters in the request body
  • Response
  • Use cases

Generates (synthesizes) speech from received text.

Note

API v1 does not support all SpeechKit synthesis options. For a comparison of API versions, see Synthesis options.

The synthesis suite is available at: tts.api.cloud.yandex.net/speech/v1/tts:synthesize

Parameters in the request bodyParameters in the request body

All parameters must be URL-encoded. The maximum size of the POST request body is 15 KB.

Parameter Description
text string
UTF-8 encoded text to convert to speech.
You can use only one of the fields: text or ssml.
To control pronunciation (pause, emphasis, and stress), use TTS markup.
Maximum string length: 5,000 characters.
ssml string
SSML text to convert to speech.
You can use only one of the fields: text or ssml.
lang string
Language.
Acceptable values: ru-RU (default), Russian.
voice string
Preferred speech synthesis voice from the list.
emotion string
Role or emotional tone of the voice. Supported only for Russian (ru-RU). See List of voices for acceptable voice/emotional tone combinations.
speed string
Synthesized speech rate.
The rate of speech is set as a decimal number in the range from 0.1 to 3.0. Where:
  • 3.0: Fastest rate.
  • 1.0 (default): Average human speech rate.
  • 0.1: Slowest speech rate.
format string
Synthesized audio format.
Acceptable values:
  • lpcm
  • oggopus (default)
  • mp3
sampleRateHertz string
Synthesized audio sampling frequency.
Applies if format equals lpcm. Acceptable values:
  • 48000 (default): 48 kHz.
  • 16000: 16 kHz.
  • 8000: 8 kHz.
folderId string

ID of the folder you have access to. Required for authorization with a user account (see Authentication with the SpeechKit API resource). Do not use this field if you make a request on behalf of a service account.

The maximum string length is 50 characters.

ResponseResponse

If the synthesis was successful, the response will contain the audio file binary content. The output data format depends on the value of the format parameter.

For more information about the response format and codes, see Response status codes.

Use casesUse cases

  • Speech synthesis in WAV format using the API v1.
  • Speech synthesis in OggOpus format using the API v1.
  • Speech synthesis from SSML text using API v1.

Was the article helpful?

Previous
Cancel
Next
Overview
© 2025 Direct Cursus Technology L.L.C.