SpeechKit Synthesis Service API v3, REST: Synthesizer.UtteranceSynthesis
Synthesizing text into speech.
HTTP request
POST https://tts.api.cloud.yandex.net/tts/v3/utteranceSynthesis
Body parameters
{
"model": "string",
// Includes only one of the fields `text`, `textTemplate`
"text": "string",
"textTemplate": {
"textTemplate": "string",
"variables": [
{
"variableName": "string",
"variableValue": "string"
}
]
},
// end of the list of possible fields
"hints": [
{
// Includes only one of the fields `voice`, `audioTemplate`, `speed`, `volume`, `role`, `pitchShift`, `duration`
"voice": "string",
"audioTemplate": {
"audio": {
// Includes only one of the fields `content`
"content": "string",
// end of the list of possible fields
"audioSpec": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
}
},
"textTemplate": {
"textTemplate": "string",
"variables": [
{
"variableName": "string",
"variableValue": "string"
}
]
},
"variables": [
{
"variableName": "string",
"variableStartMs": "string",
"variableLengthMs": "string"
}
]
},
"speed": "string",
"volume": "string",
"role": "string",
"pitchShift": "string",
"duration": {
"policy": "string",
"durationMs": "string"
}
// end of the list of possible fields
}
],
"outputAudioSpec": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
},
"loudnessNormalizationType": "string",
"unsafeMode": "boolean"
}
Field |
Description |
model |
string The name of the model. |
text |
string Raw text (e.g. "Hello, Alice"). Includes only one of the fields Text to synthesis, one of text synthesis markups. |
textTemplate |
Text template instance, e.g. Includes only one of the fields Text to synthesis, one of text synthesis markups. |
hints[] |
Optional hints for synthesis. |
outputAudioSpec |
Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header |
loudnessNormalizationType |
enum (LoudnessNormalizationType) Specifies type of loudness normalization.
|
unsafeMode |
boolean Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible. |
TextTemplate
Field |
Description |
textTemplate |
string Template text. Sample: |
variables[] |
Defining variables in template text. Sample: |
TextVariable
Field |
Description |
variableName |
string The name of the variable. |
variableValue |
string The text of the variable. |
Hints
Field |
Description |
voice |
string Name of speaker to use. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
audioTemplate |
Template for synthesizing. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
speed |
string Hint to change speed. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
volume |
string Hint to regulate normalization level.
Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
role |
string Hint to specify pronunciation character for the speaker. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
pitchShift |
string Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
duration |
Hint to limit both minimum and maximum audio duration. Includes only one of the fields The hint for TTS engine to specify synthesised audio characteristics. |
AudioTemplate
Field |
Description |
audio |
Audio file. |
textTemplate |
Template and description of its variables. |
variables[] |
Describing variables in audio. |
AudioContent
Field |
Description |
content |
string (bytes) Bytes with audio data. Includes only one of the fields The audio source to read the data from. |
audioSpec |
Description of the audio format. |
AudioFormatOptions
Field |
Description |
rawAudio |
The audio format specified in request parameters. Includes only one of the fields |
containerAudio |
The audio format specified inside the container metadata. Includes only one of the fields |
RawAudio
Field |
Description |
audioEncoding |
enum (AudioEncoding) Encoding type.
|
sampleRateHertz |
string (int64) Sampling frequency of the signal. |
ContainerAudio
Field |
Description |
containerAudioType |
enum (ContainerAudioType)
|
AudioVariable
Field |
Description |
variableName |
string The name of the variable. |
variableStartMs |
string (int64) Start time of the variable in milliseconds. |
variableLengthMs |
string (int64) Length of the variable in milliseconds. |
DurationHint
Field |
Description |
policy |
enum (DurationHintPolicy) Type of duration constraint.
|
durationMs |
string (int64) Constraint on audio duration in milliseconds. |
Response
HTTP Code: 200 - OK
{
"audioChunk": {
"data": "string"
},
"textChunk": {
"text": "string"
},
"startMs": "string",
"lengthMs": "string"
}
Field |
Description |
audioChunk |
Part of synthesized audio. |
textChunk |
Part of synthesized text. |
startMs |
string (int64) Start time of the audio chunk in milliseconds. |
lengthMs |
string (int64) Length of the audio chunk in milliseconds. |
AudioChunk
Field |
Description |
data |
string (bytes) Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |
TextChunk
Field |
Description |
text |
string Synthesized text. |