SpeechKit Synthesis Service API v3, REST: Synthesizer.UtteranceSynthesis
Synthesizing text into speech.
HTTP request
POST https://tts.api.ml.yandexcloud.kz/tts/v3/utteranceSynthesis
Body parameters
{
"model": "string",
// Includes only one of the fields `text`, `textTemplate`
"text": "string",
"textTemplate": {
"textTemplate": "string",
"variables": [
{
"variableName": "string",
"variableValue": "string"
}
]
},
// end of the list of possible fields
"hints": [
{
// Includes only one of the fields `voice`, `audioTemplate`, `speed`, `volume`, `role`, `pitchShift`, `duration`
"voice": "string",
"audioTemplate": {
"audio": {
// Includes only one of the fields `content`
"content": "string",
// end of the list of possible fields
"audioSpec": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
}
},
"textTemplate": {
"textTemplate": "string",
"variables": [
{
"variableName": "string",
"variableValue": "string"
}
]
},
"variables": [
{
"variableName": "string",
"variableStartMs": "string",
"variableLengthMs": "string"
}
]
},
"speed": "string",
"volume": "string",
"role": "string",
"pitchShift": "string",
"duration": {
"policy": "string",
"durationMs": "string"
}
// end of the list of possible fields
}
],
"outputAudioSpec": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
},
"loudnessNormalizationType": "string",
"unsafeMode": "boolean"
}
|
Field |
Description |
|
model |
string The name of the model. |
|
text |
string Raw text (e.g. "Hello, Alice"). Includes only one of the fields Text to synthesis, one of text synthesis markups. |
|
textTemplate |
Text template instance, e.g. Includes only one of the fields Text to synthesis, one of text synthesis markups. |
|
hints[] |
Optional hints for synthesis. |
|
outputAudioSpec |
Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header |
|
loudnessNormalizationType |
enum (LoudnessNormalizationType) Specifies type of loudness normalization.
|
|
unsafeMode |
boolean Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible. |
TextTemplate
|
Field |
Description |
|
textTemplate |
string Template text. Sample: |
|
variables[] |
Defining variables in template text. Sample: |
TextVariable
|
Field |
Description |
|
variableName |
string The name of the variable. |
|
variableValue |
string The text of the variable. |
Hints
|
Field |
Description |
|
voice |
string Name of speaker to use. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
|
audioTemplate |
Template for synthesizing. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
|
speed |
string Hint to change speed. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
|
volume |
string Hint to regulate normalization level.
Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
|
role |
string Hint to specify pronunciation character for the speaker. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
|
pitchShift |
string Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
|
duration |
Hint to limit both minimum and maximum audio duration. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
AudioTemplate
|
Field |
Description |
|
audio |
Audio file. |
|
textTemplate |
Template and description of its variables. |
|
variables[] |
Describing variables in audio. |
AudioContent
|
Field |
Description |
|
content |
string (bytes) Bytes with audio data. Includes only one of the fields The audio source to read the data from. |
|
audioSpec |
Description of the audio format. |
AudioFormatOptions
|
Field |
Description |
|
rawAudio |
The audio format specified in request parameters. Includes only one of the fields |
|
containerAudio |
The audio format specified inside the container metadata. Includes only one of the fields |
RawAudio
|
Field |
Description |
|
audioEncoding |
enum (AudioEncoding) Encoding type.
|
|
sampleRateHertz |
string (int64) Sampling frequency of the signal. |
ContainerAudio
|
Field |
Description |
|
containerAudioType |
enum (ContainerAudioType)
|
AudioVariable
|
Field |
Description |
|
variableName |
string The name of the variable. |
|
variableStartMs |
string (int64) Start time of the variable in milliseconds. |
|
variableLengthMs |
string (int64) Length of the variable in milliseconds. |
DurationHint
|
Field |
Description |
|
policy |
enum (DurationHintPolicy) Type of duration constraint.
|
|
durationMs |
string (int64) Constraint on audio duration in milliseconds. |
Response
HTTP Code: 200 - OK
{
"audioChunk": {
"data": "string"
},
"textChunk": {
"text": "string"
},
"startMs": "string",
"lengthMs": "string"
}
|
Field |
Description |
|
audioChunk |
Part of synthesized audio. |
|
textChunk |
Part of synthesized text. |
|
startMs |
string (int64) Start time of the audio chunk in milliseconds. |
|
lengthMs |
string (int64) Length of the audio chunk in milliseconds. |
AudioChunk
|
Field |
Description |
|
data |
string (bytes) Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |
TextChunk
|
Field |
Description |
|
text |
string Synthesized text. |