Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Speech synthesis in the API v3
      • Speech synthesis in REST API v3
      • Pattern-based speech synthesis
      • Brand Voice Call Center pattern-based speech synthesis
      • Speech synthesis in WAV format, API v1
      • Speech synthesis in OggOpus format, API v1
      • Speech synthesis from SSML text, API v1
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  1. Step-by-step guides
  2. Speech synthesis
  3. Speech synthesis from SSML text, API v1

Speech synthesis from SSML text using API v1

Written by
Yandex Cloud
Updated at February 10, 2025

With the API v1, you can synthesize speech from text marked up using SSML to an OggOpus file.

The example uses the following synthesis parameters:

  • Language: Russian.
  • Voice: jane.
  • Other parameters are left at their defaults.

The text file is read using the cat utility.

The Yandex account or federated account are authenticated using an IAM token. If using a service account, you do not need to include the folder ID in the request. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.

Bash
  1. Create a file, e.g., text.xml, and add into it this text in SSML format:

    <speak>
      Here are some examples of how you can use SSML.
      You can add a custom pause to your text:<break time="2s"/> Ta-daaah!
      Or mark up your text into paragraphs and sentences. Pauses between paragraphs are longer.
      <p><s>Sentence one</s><s>Sentence two</s></p>
      You can also substitute phrases.
      For example, you can use this feature to pronounce abbreviations, <sub alias="et cetera">etc.</sub>
    </speak>
    
  2. Send a request with the text to the server:

    export FOLDER_ID=<folder_ID>
    export IAM_TOKEN=<IAM_token>
    curl \
      --request POST \
      --header "Authorization: Bearer ${IAM_TOKEN}" \
      --data-urlencode "ssml=`cat text.xml`" \
      --data "lang=ru-RU&voice=jane&folderId=${FOLDER_ID}" \
      "https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize" > speech.ogg
    

    Where:

    • FOLDER_ID: Folder ID.
    • IAM_TOKEN: IAM token.
    • ssml: File with text marked up according to SSML rules.
    • lang: Text language.

The synthesized speech will be written to the speech.ogg file in the folder you sent your request from.

See alsoSee also

  • API v1 method description
  • Speech synthesis in WAV format using the API v1
  • Speech synthesis in OggOpus format using the API v1
  • Authentication with the SpeechKit API

Was the article helpful?

Previous
Speech synthesis in OggOpus format, API v1
Next
About the technology
© 2025 Direct Cursus Technology L.L.C.