SSML markup
With the Speech Synthesis Markup Language (SSML), you can control how SpeechKit synthesizes speech from text.
Note
SpeechKit is designed for natural speech synthesis. Marking up data for speech synthesis helps set up pronunciation of separate words, phrases, and sentences. However, it is not intended for generating separate sounds and silence.
The markup in the text will serve as a cue for synthesis, not as a direct instruction.
SSML is supported only when using API v1.
To provide text in SSML format, use the ssml
parameter in the request body and wrap the text in the <speak>
tag:
<speak>
Here are some examples of how you can use SSML.
You can add a custom pause to your text:<break time="2s"/> Ta-daaah!
Or mark up your text into paragraphs and sentences. Pauses between paragraphs are longer.
<p><s>Sentence one</s><s>Sentence two</s></p>
You can also substitute phrases.
For example, you can use this feature to pronounce abbreviations, <sub alias="et cetera">etc.</sub>
</speak>
Supported SSML tags
Currently, SpeechKit supports the following SSML tags:
Description | Tag |
---|---|
Add a pause | <break> |
Add a pause between paragraphs | <p> |
Use phonetic pronunciation | <phoneme> |
Root tag for SSML text | <speak> |
Add a pause between sentences | <s> |
Abbreviation pronunciation | <sub> |
break
Use the <break>
tag to add a pause with a specified duration to the speech. The duration is specified using the strength
and time
attributes. If these attributes are not set, the strength="medium"
value is used by default.
Attribute | Description |
---|---|
strength |
Pause duration, depends on the context. Acceptable values: * weak : Short pause up to 250 milliseconds.* medium : Medium pause up to 400 milliseconds.* strong : Same as the pause after a period or sentence.* x-strong : Same as the pause after a paragraph.* none or x-weak : These values do not add any pause, they are kept for AWS API compatibility. |
time |
Pause duration in seconds or milliseconds, for example 2s or 400ms . The maximum pause duration is 5 seconds.When synthesizing pauses for specified lengths, there may be an error of 100-200 ms. |
<speak>Hey, wait a second<break time="1s"/> What are you doing?</speak>
The <break>
tag adds a pause even if it comes after other elements that already add a pause, such as periods and commas.
p
Use the <p>
tag to add a pause between paragraphs. A pause is added after the closing tag.
A pause after a paragraph is longer than a pause after a sentence or period. The duration of a pause depends on the selected voice, emotional tone, speed, and language.
<speak>
<p>The executioner's argument was that you couldn't cut off something's head unless there was a trunk to sever it from.</p>
<p>The King's argument was that anything that had a head could be beheaded, and that you weren't to talk nonsense.</p>
<p>The Queen's argument was that if something wasn't done about it in less than no time, she'd have everyone beheaded all round. It was this last argument that had everyone looking so nervous and uncomfortable.</p>
</speak>
All pauses inside the tag are also taken into account. For example, an additional pause will be added in place of a period even if it is before the closing tag.
phoneme
Use the <phoneme>
tag to check the proper pronunciation using phonemes. The text specified in the ph
attribute will be used for playback. In the alphabet
attribute, specify which standard to use: ipa
or x-sampa
.
-
International phonetic alphabet (IPA
)<speak> In different regions of Russia, the letter <phoneme alphabet="ipa" ph="o">O</phoneme> is pronounced differently in words. Somewhere they say <phoneme alphabet="ipa" ph="məlɐko">молоко</phoneme>, where elsewhere they say <phoneme alphabet="ipa" ph="mələko">молоко</phoneme>. </speak>
-
Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA
)<speak> In different regions of Russia, the letter <phoneme alphabet="x-sampa" ph="o">O</phoneme> is pronounced differently in words. Somewhere they say <phoneme alphabet="x-sampa" ph="m@l6ko">молоко</phoneme>, where elsewhere they say <phoneme alphabet="x-sampa" ph="m@l@ko">молоко</phoneme>. </speak>
You can find the list of supported phonemes here.
speak
<speak>
is the root tag. The entire text must contained inside this tag.
<speak>My text in SSML format.</speak>
s
Use the <s>
tag to add a pause between sentences. A pause after a sentence is the same as a pause after a period. The duration of a pause depends on the selected voice, emotional tone, speed, and language.
<speak>
<s>First sentence</s>
<s>Second sentence</s>
</speak>
All pauses inside the tag are also taken into account. For example, an additional pause will be added in place of a period even if it is before the closing tag.
sub
Use the <sub>
tag to replace one text with another when pronouncing it. For example, you could use it to correctly pronounce an abbreviation or the name of a chemical element.
<speak>
My favorite chemical element is <sub alias="Mercury">Hg</sub> because it's shiny.
</speak>