SpeechKit Synthesis Service API v3, REST: Synthesizer.utteranceSynthesis
Synthesizing text into speech.
HTTP request
POST https://tts.api.cloud.yandex.net/tts/v3/utteranceSynthesis
Body parameters
{
"model": "string",
"hints": [
{
// `hints[]` includes only one of the fields `voice`, `audioTemplate`, `speed`, `volume`, `role`, `pitchShift`, `duration`
"voice": "string",
"audioTemplate": {
"audio": {
"audioSpec": {
// `hints[].audioTemplate.audio.audioSpec` includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string"
},
"containerAudio": {
"containerAudioType": "string"
},
// end of the list of possible fields`hints[].audioTemplate.audio.audioSpec`
},
"content": "string"
},
"textTemplate": {
"textTemplate": "string",
"variables": [
{
"variableName": "string",
"variableValue": "string"
}
]
},
"variables": [
{
"variableName": "string",
"variableStartMs": "string",
"variableLengthMs": "string"
}
]
},
"speed": "number",
"volume": "number",
"role": "string",
"pitchShift": "number",
"duration": {
"policy": "string",
"durationMs": "string"
},
// end of the list of possible fields`hints[]`
}
],
"outputAudioSpec": {
// `outputAudioSpec` includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string"
},
"containerAudio": {
"containerAudioType": "string"
},
// end of the list of possible fields`outputAudioSpec`
},
"loudnessNormalizationType": "string",
"unsafeMode": true,
// includes only one of the fields `text`, `textTemplate`
"text": "string",
"textTemplate": {
"textTemplate": "string",
"variables": [
{
"variableName": "string",
"variableValue": "string"
}
]
},
// end of the list of possible fields
}
Field | Description |
---|---|
model | string The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it. |
hints[] | object Optional hints for synthesis. |
hints[]. voice |
string hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration Name of speaker to use. |
hints[]. audioTemplate |
object Template for synthesizing. hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration |
hints[]. audioTemplate. audio |
object Audio file. |
hints[]. audioTemplate. audio. audioSpec |
object Description of the audio format. |
hints[]. audioTemplate. audio. audioSpec. rawAudio |
object The audio format specified in request parameters. hints[].audioTemplate.audio.audioSpec includes only one of the fields rawAudio , containerAudio |
hints[]. audioTemplate. audio. audioSpec. rawAudio. audioEncoding |
string Encoding type.
|
hints[]. audioTemplate. audio. audioSpec. rawAudio. sampleRateHertz |
string (int64) Sampling frequency of the signal. |
hints[]. audioTemplate. audio. audioSpec. containerAudio |
object The audio format specified inside the container metadata. hints[].audioTemplate.audio.audioSpec includes only one of the fields rawAudio , containerAudio |
hints[]. audioTemplate. audio. audioSpec. containerAudio. containerAudioType |
string
|
hints[]. audioTemplate. audio. content |
string (byte) Bytes with audio data. |
hints[]. audioTemplate. textTemplate |
object Template and description of its variables. |
hints[]. audioTemplate. textTemplate. textTemplate |
string Template text. Sample: |
hints[]. audioTemplate. textTemplate. variables[] |
object Defining variables in template text. Sample: |
hints[]. audioTemplate. textTemplate. variables[]. variableName |
string The name of the variable. |
hints[]. audioTemplate. textTemplate. variables[]. variableValue |
string The text of the variable. |
hints[]. audioTemplate. variables[] |
object Describing variables in audio. |
hints[]. audioTemplate. variables[]. variableName |
string The name of the variable. |
hints[]. audioTemplate. variables[]. variableStartMs |
string (int64) Start time of the variable in milliseconds. |
hints[]. audioTemplate. variables[]. variableLengthMs |
string (int64) Length of the variable in milliseconds. |
hints[]. speed |
number (double) hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration Hint to change speed. |
hints[]. volume |
number (double) hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration Hint to regulate normalization level.
|
hints[]. role |
string hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration Hint to specify pronunciation character for the speaker. |
hints[]. pitchShift |
number (double) hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0. |
hints[]. duration |
object Hint to limit both minimum and maximum audio duration. hints[] includes only one of the fields voice , audioTemplate , speed , volume , role , pitchShift , duration |
hints[]. duration. policy |
string Type of duration constraint.
|
hints[]. duration. durationMs |
string (int64) Constraint on audio duration in milliseconds. |
outputAudioSpec | object Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header |
outputAudioSpec. rawAudio |
object The audio format specified in request parameters. outputAudioSpec includes only one of the fields rawAudio , containerAudio |
outputAudioSpec. rawAudio. audioEncoding |
string Encoding type.
|
outputAudioSpec. rawAudio. sampleRateHertz |
string (int64) Sampling frequency of the signal. |
outputAudioSpec. containerAudio |
object The audio format specified inside the container metadata. outputAudioSpec includes only one of the fields rawAudio , containerAudio |
outputAudioSpec. containerAudio. containerAudioType |
string
|
loudnessNormalizationType | string Specifies type of loudness normalization. Optional. Default: LUFS .
|
unsafeMode | boolean (boolean) Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible. |
text | string includes only one of the fields text , textTemplate Raw text (e.g. "Hello, Alice"). |
textTemplate | object Text template instance, e.g. {"Hello, {username}" with username="Alice"} . includes only one of the fields text , textTemplate |
textTemplate. textTemplate |
string includes only one of the fields text , textTemplate Template text. Sample: |
textTemplate. variables[] |
object Defining variables in template text. Sample: |
textTemplate. variables[]. variableName |
string The name of the variable. |
textTemplate. variables[]. variableValue |
string The text of the variable. |
Response
HTTP Code: 200 - OK
{
"audioChunk": {
"data": "string"
},
"textChunk": {
"text": "string"
},
"startMs": "string",
"lengthMs": "string"
}
Field | Description |
---|---|
audioChunk | object Part of synthesized audio. |
audioChunk. data |
string (byte) Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |
textChunk | object Part of synthesized text. |
textChunk. text |
string Synthesized text. |
startMs | string (int64) Start time of the audio chunk in milliseconds. |
lengthMs | string (int64) Length of the audio chunk in milliseconds. |