SpeechKit Hybrid Synthesis Service API, gRPC: Synthesizer
Written by
Updated at January 16, 2024
A set of methods for voice synthesis.
| Call | Description |
|---|---|
| UtteranceSynthesis | Synthesizing text into speech. |
Calls Synthesizer
UtteranceSynthesis
Synthesizing text into speech.
rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)
UtteranceSynthesisRequest
| Field | Description |
|---|---|
| model | string The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it. |
| Utterance | oneof: text or text_templateText to synthesis, one of text synthesis markups. |
| text | string Raw text (e.g. "Hello, Alice"). |
| text_template | TextTemplate Text template instance, e.g. {"Hello, {username}" with username="Alice"}. |
| hints[] | Hints Optional hints for synthesis. |
| output_audio_spec | AudioFormatOptions Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header |
| loudness_normalization_type | enum LoudnessNormalizationType Specifies type of loudness normalization. Optional. Default: LUFS.
|
| unsafe_mode | bool Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible. |
TextTemplate
| Field | Description |
|---|---|
| text_template | string Template text. Sample: The {animal} goes to the {place}. |
| variables[] | TextVariable Defining variables in template text. Sample: {animal: cat, place: forest} |
TextVariable
| Field | Description |
|---|---|
| variable_name | string The name of the variable. |
| variable_value | string The text of the variable. |
Hints
| Field | Description |
|---|---|
| Hint | oneof: voice, audio_template, speed, volume, role, pitch_shift or durationThe hint for TTS engine to specify synthesised audio characteristics. |
| voice | string Name of speaker to use. |
| audio_template | AudioTemplate Template for synthesizing. |
| speed | double Hint to change speed. |
| volume | double Hint to regulate normalization level.
|
| role | string Hint to specify pronunciation character for the speaker. |
| pitch_shift | double Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0. |
| duration | DurationHint Hint to limit both minimum and maximum audio duration. |
AudioTemplate
| Field | Description |
|---|---|
| audio | AudioContent Audio file. |
| text_template | TextTemplate Template and description of its variables. |
| variables[] | AudioVariable Describing variables in audio. |
AudioContent
| Field | Description |
|---|---|
| AudioSource | oneof: contentThe audio source to read the data from. |
| content | bytes Bytes with audio data. |
| audio_spec | AudioFormatOptions Description of the audio format. |
AudioVariable
| Field | Description |
|---|---|
| variable_name | string The name of the variable. |
| variable_start_ms | int64 Start time of the variable in milliseconds. |
| variable_length_ms | int64 Length of the variable in milliseconds. |
DurationHint
| Field | Description |
|---|---|
| policy | enum DurationHintPolicy Type of duration constraint.
|
| duration_ms | int64 Constraint on audio duration in milliseconds. |
AudioFormatOptions
| Field | Description |
|---|---|
| AudioFormat | oneof: raw_audio or container_audio |
| raw_audio | RawAudio The audio format specified in request parameters. |
| container_audio | ContainerAudio The audio format specified inside the container metadata. |
RawAudio
| Field | Description |
|---|---|
| audio_encoding | enum AudioEncoding Encoding type.
|
| sample_rate_hertz | int64 Sampling frequency of the signal. |
ContainerAudio
| Field | Description |
|---|---|
| container_audio_type | enum ContainerAudioType
|
UtteranceSynthesisResponse
| Field | Description |
|---|---|
| audio_chunk | AudioChunk Part of synthesized audio. |
| text_chunk | TextChunk Part of synthesized text. |
| start_ms | int64 Start time of the audio chunk in milliseconds. |
| length_ms | int64 Length of the audio chunk in milliseconds. |
AudioChunk
| Field | Description |
|---|---|
| data | bytes Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |
TextChunk
| Field | Description |
|---|---|
| text | string Synthesized text. |