SpeechKit Hybrid Synthesis Service API, gRPC: Synthesizer
Written by
Updated at January 16, 2024
A set of methods for voice synthesis.
Call | Description |
---|---|
UtteranceSynthesis | Synthesizing text into speech. |
Calls Synthesizer
UtteranceSynthesis
Synthesizing text into speech.
rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)
UtteranceSynthesisRequest
Field | Description |
---|---|
model | string The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it. |
Utterance | oneof: text or text_template Text to synthesis, one of text synthesis markups. |
text | string Raw text (e.g. "Hello, Alice"). |
text_template | TextTemplate Text template instance, e.g. {"Hello, {username}" with username="Alice"} . |
hints[] | Hints Optional hints for synthesis. |
output_audio_spec | AudioFormatOptions Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header |
loudness_normalization_type | enum LoudnessNormalizationType Specifies type of loudness normalization. Optional. Default: LUFS .
|
unsafe_mode | bool Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible. |
TextTemplate
Field | Description |
---|---|
text_template | string Template text. Sample: The {animal} goes to the {place}. |
variables[] | TextVariable Defining variables in template text. Sample: {animal: cat, place: forest} |
TextVariable
Field | Description |
---|---|
variable_name | string The name of the variable. |
variable_value | string The text of the variable. |
Hints
Field | Description |
---|---|
Hint | oneof: voice , audio_template , speed , volume , role , pitch_shift or duration The hint for TTS engine to specify synthesised audio characteristics. |
voice | string Name of speaker to use. |
audio_template | AudioTemplate Template for synthesizing. |
speed | double Hint to change speed. |
volume | double Hint to regulate normalization level.
|
role | string Hint to specify pronunciation character for the speaker. |
pitch_shift | double Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0. |
duration | DurationHint Hint to limit both minimum and maximum audio duration. |
AudioTemplate
Field | Description |
---|---|
audio | AudioContent Audio file. |
text_template | TextTemplate Template and description of its variables. |
variables[] | AudioVariable Describing variables in audio. |
AudioContent
Field | Description |
---|---|
AudioSource | oneof: content The audio source to read the data from. |
content | bytes Bytes with audio data. |
audio_spec | AudioFormatOptions Description of the audio format. |
AudioVariable
Field | Description |
---|---|
variable_name | string The name of the variable. |
variable_start_ms | int64 Start time of the variable in milliseconds. |
variable_length_ms | int64 Length of the variable in milliseconds. |
DurationHint
Field | Description |
---|---|
policy | enum DurationHintPolicy Type of duration constraint.
|
duration_ms | int64 Constraint on audio duration in milliseconds. |
AudioFormatOptions
Field | Description |
---|---|
AudioFormat | oneof: raw_audio or container_audio |
raw_audio | RawAudio The audio format specified in request parameters. |
container_audio | ContainerAudio The audio format specified inside the container metadata. |
RawAudio
Field | Description |
---|---|
audio_encoding | enum AudioEncoding Encoding type.
|
sample_rate_hertz | int64 Sampling frequency of the signal. |
ContainerAudio
Field | Description |
---|---|
container_audio_type | enum ContainerAudioType
|
UtteranceSynthesisResponse
Field | Description |
---|---|
audio_chunk | AudioChunk Part of synthesized audio. |
text_chunk | TextChunk Part of synthesized text. |
start_ms | int64 Start time of the audio chunk in milliseconds. |
length_ms | int64 Length of the audio chunk in milliseconds. |
AudioChunk
Field | Description |
---|---|
data | bytes Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |
TextChunk
Field | Description |
---|---|
text | string Synthesized text. |