Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • List of voices
      • Overview
      • TTS markup
      • SSML markup
      • List of TTS supported phonemes
      • List of SSML supported phonemes
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Supported SSML tags
  • break
  • p
  • phoneme
  • speak
  • s
  • sub
  • Use cases
  1. Speech synthesis
  2. Text markup
  3. SSML markup

SSML markup

Written by
Yandex Cloud
Updated at April 30, 2025
  • Supported SSML tags
    • break
    • p
    • phoneme
    • speak
    • s
    • sub
  • Use cases

With the Speech Synthesis Markup Language (SSML), you can control how SpeechKit synthesizes speech from text.

Note

SpeechKit is designed for natural speech synthesis. Marking up data for speech synthesis helps set up pronunciation of separate words, phrases, and sentences. However, it is not intended for generating separate sounds and silence.

The markup in the text will serve as a cue for synthesis, not as a direct instruction.

SSML is supported only when using API v1.

To provide text in SSML format, use the ssml parameter in the request body and wrap the text itself in the <speak> tag:

<speak>
  Here are some examples of how you can use SSML.
  You can add a custom pause to your text:<break time="2s"/> Ta-daaah!
  Or mark up your text into paragraphs and sentences. Pauses between paragraphs are longer.
  <p><s>Sentence one</s><s>Sentence two</s></p>
  You can also substitute phrases.
  For example, you can use this feature to pronounce abbreviations, <sub alias="et cetera">etc.</sub>
</speak>

Example of sending a request.

Supported SSML tagsSupported SSML tags

Currently, SpeechKit supports the following SSML tags:

Description Tag
Add a pause <break>
Add a pause between paragraphs <p>
Use phonetic pronunciation <phoneme>
Root tag for SSML text <speak>
Add a pause between sentences <s>
Abbreviation pronunciation <sub>

breakbreak

Use the <break> tag to add a pause with a specified duration to the speech. The duration is specified using the strength and time attributes. If these attributes are not set, the strength="medium" value is used by default.

Attribute Description
strength Pause duration, depends on the context. Acceptable values:
* weak: Short pause up to 250 milliseconds.
* medium: Medium pause up to 400 milliseconds.
* strong: Equivalent to the pause after a fullstop or sentence.
* x-strong: Equivalent to the pause after a paragraph.
* none or x-weak: These are not adding any pause but are left for AWS API compatibility.
time Pause duration in seconds or milliseconds, e.g., 2s or 400ms. The maximum pause duration is five seconds.
When synthesizing a pause of specified length, the possible error is 100-200 ms.
<speak>Hey, wait a second<break time="1s"/> What are you doing?</speak>

The <break> tag adds a pause even if it comes after other pause-adding elements, e.g., periods and commas.

pp

Use the <p> tag to add a pause between paragraphs. A pause is added after the closing tag.

A pause after a paragraph is longer than a pause after a sentence or period. The duration of a pause depends on the selected voice, emotional tone, speed, and language.

<speak>
  <p>The executioner's argument was that you couldn't cut off something's head unless there was a trunk to sever it from.</p>
  <p>The King's argument was that anything that had a head could be beheaded, and that you weren't to talk nonsense.</p>
  <p>The Queen's argument was that if something wasn't done about it in less than no time, she'd have everyone beheaded all round. It was this last argument that had everyone looking so nervous and uncomfortable.</p>
</speak>

All pauses inside the tag are also taken into account. For example, an additional pause will be added for a period, even if it comes before the closing tag.

phonemephoneme

Use the <phoneme> tag to control proper pronunciation using phonemes. The text specified in the ph attribute will be used for playback. In the alphabet attribute, specify the preferred standard: ipa or x-sampa.

  • International phonetic alphabet (IPA)

    <speak>
          In different regions of Russia, the letter
          <phoneme alphabet="ipa" ph="o">O</phoneme> in words.
          In some areas, people say <phoneme alphabet="ipa" ph="məlɐko">молоко</phoneme>,
          while in others, <phoneme alphabet="ipa" ph="mələko">молоко</phoneme>.
          
    </speak>
    
  • Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA)

    <speak>
          In different regions of Russia, the letter
          <phoneme alphabet="x-sampa" ph="o">O</phoneme> in words.
          In some areas, people say <phoneme alphabet="x-sampa" ph="m@l6ko">молоко</phoneme>,
          while in others, <phoneme alphabet="x-sampa" ph="m@l@ko">молоко</phoneme>.
          
    </speak>
    

You can find the list of supported phonemes here.

speakspeak

<speak> is the root tag. The whole text must be inside this tag.

<speak>My text in SSML format.</speak>

ss

Use the <s> tag to add a pause between sentences. A pause after a sentence is the same as a pause after a period. The duration of a pause depends on the selected voice, emotional tone, speed, and language.

<speak>
  <s>First sentence</s>
  <s>Second sentence</s>
</speak>

All pauses inside the tag are also taken into account. For example, an additional pause will be added in place of a period even if it is before the closing tag.

subsub

Use the <sub> tag to replace one text with another when pronouncing it. For example, you could use it to correctly pronounce an abbreviation or the name of a chemical element.

<speak>
  My favorite chemical element is <sub alias="mercury">Hg</sub> because it is shiny.
</speak>

Use casesUse cases

  • Speech synthesis from SSML text using API v1

Was the article helpful?

Previous
TTS markup
Next
List of TTS supported phonemes
© 2025 Direct Cursus Technology L.L.C.