SpeechKit Hybrid Synthesis Service API, gRPC: Synthesizer

Written by

Updated at January 16, 2024

Calls Synthesizer
UtteranceSynthesis

A set of methods for voice synthesis.

Call	Description
UtteranceSynthesis	Synthesizing text into speech.

Calls Synthesizer

UtteranceSynthesis

Synthesizing text into speech.

rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)

UtteranceSynthesisRequest

Field	Description
model	string The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it.
Utterance	oneof: `text` or `text_template` Text to synthesis, one of text synthesis markups.
text	string Raw text (e.g. "Hello, Alice").
text_template	TextTemplate Text template instance, e.g. `{"Hello, {username}" with username="Alice"}`.
hints[]	Hints Optional hints for synthesis.
output_audio_spec	AudioFormatOptions Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header
loudness_normalization_type	enum LoudnessNormalizationType Specifies type of loudness normalization. Optional. Default: `LUFS`. `MAX_PEAK`: The type of normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to a given level. `LUFS`: The type of normalization based on EBU R 128 recommendation.
unsafe_mode	bool Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible.

TextTemplate

Field	Description
text_template	string Template text. Sample:`The {animal} goes to the {place}.`
variables[]	TextVariable Defining variables in template text. Sample: `{animal: cat, place: forest}`

TextVariable

Field	Description
variable_name	string The name of the variable.
variable_value	string The text of the variable.

Hints

Field	Description
Hint	oneof: `voice`, `audio_template`, `speed`, `volume`, `role`, `pitch_shift` or `duration` The hint for TTS engine to specify synthesised audio characteristics.
voice	string Name of speaker to use.
audio_template	AudioTemplate Template for synthesizing.
speed	double Hint to change speed.
volume	double Hint to regulate normalization level. For `MAX_PEAK` loudness_normalization_type: volume changes in a range (0;1], default value is 0.7. For `LUFS` loudness_normalization_type: volume changes in a range [-145;0), default value is -19.
role	string Hint to specify pronunciation character for the speaker.
pitch_shift	double Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0.
duration	DurationHint Hint to limit both minimum and maximum audio duration.

AudioTemplate

Field	Description
audio	AudioContent Audio file.
text_template	TextTemplate Template and description of its variables.
variables[]	AudioVariable Describing variables in audio.

AudioContent

Field	Description
AudioSource	oneof: `content` The audio source to read the data from.
content	bytes Bytes with audio data.
audio_spec	AudioFormatOptions Description of the audio format.

AudioVariable

Field	Description
variable_name	string The name of the variable.
variable_start_ms	int64 Start time of the variable in milliseconds.
variable_length_ms	int64 Length of the variable in milliseconds.

DurationHint

Field	Description
policy	enum DurationHintPolicy Type of duration constraint. `EXACT_DURATION`: Limit audio duration to exact value. `MIN_DURATION`: Limit the minimum audio duration. `MAX_DURATION`: Limit the maximum audio duration.
duration_ms	int64 Constraint on audio duration in milliseconds.

AudioFormatOptions

Field	Description
AudioFormat	oneof: `raw_audio` or `container_audio`
raw_audio	RawAudio The audio format specified in request parameters.
container_audio	ContainerAudio The audio format specified inside the container metadata.

RawAudio

Field	Description
audio_encoding	enum AudioEncoding Encoding type. `LINEAR16_PCM`: Audio bit depth 16-bit signed little-endian (Linear PCM).
sample_rate_hertz	int64 Sampling frequency of the signal.

ContainerAudio

Field	Description
container_audio_type	enum ContainerAudioType `WAV`: Audio bit depth 16-bit signed little-endian (Linear PCM). `OGG_OPUS`: Data is encoded using the OPUS audio codec and compressed using the OGG container format. `MP3`: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

UtteranceSynthesisResponse

Field	Description
audio_chunk	AudioChunk Part of synthesized audio.
text_chunk	TextChunk Part of synthesized text.
start_ms	int64 Start time of the audio chunk in milliseconds.
length_ms	int64 Length of the audio chunk in milliseconds.

AudioChunk

Field	Description
data	bytes Sequence of bytes of the synthesized audio in format specified in output_audio_spec.

TextChunk

Field	Description
text	string Synthesized text.

Was the article helpful?

Previous

Next