SpeechKit Synthesis Service API v3, gRPC: Synthesizer.StreamSynthesis
Bidirectional streaming RPC for real-time synthesis.
gRPC request
rpc StreamSynthesis (stream StreamSynthesisRequest) returns (stream StreamSynthesisResponse)
StreamSynthesisRequest
{
// Includes only one of the fields `options`, `synthesis_input`, `force_synthesis`
"options": {
"model": "string",
"voice": "string",
"role": "string",
"speed": "double",
"volume": "double",
"pitch_shift": "double",
"output_audio_spec": {
// Includes only one of the fields `raw_audio`, `container_audio`
"raw_audio": {
"audio_encoding": "AudioEncoding",
"sample_rate_hertz": "int64"
},
"container_audio": {
"container_audio_type": "ContainerAudioType"
}
// end of the list of possible fields
},
"loudness_normalization_type": "LoudnessNormalizationType"
},
"synthesis_input": {
"text": "string"
},
"force_synthesis": "ForceSynthesisEvent"
// end of the list of possible fields
}
Sent by client to control or provide data during streaming synthesis.
Field |
Description |
options |
Synthesis options. Must be provided in the first request of the stream and cannot be updated afterwards. Includes only one of the fields |
synthesis_input |
Input to be synthesized. Includes only one of the fields |
force_synthesis |
Triggers immediate synthesis of buffered input. Includes only one of the fields |
SynthesisOptions
Field |
Description |
model |
string The name of the TTS model to use for synthesis. Currently should be empty. Do not use it. |
voice |
string The voice to use for speech synthesis. |
role |
string The role or speaking style. Can be used to specify pronunciation character for the speaker. |
speed |
double Speed multiplier (default: 1.0). |
volume |
double Volume adjustment:
|
pitch_shift |
double Pitch adjustment, in Hz, range [-1000, 1000], default 0. |
output_audio_spec |
Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header. |
loudness_normalization_type |
enum LoudnessNormalizationType Loudness normalization type for output (default:
|
AudioFormatOptions
Field |
Description |
raw_audio |
The audio format specified in request parameters. Includes only one of the fields |
container_audio |
The audio format specified inside the container metadata. Includes only one of the fields |
RawAudio
Field |
Description |
audio_encoding |
enum AudioEncoding Encoding type.
|
sample_rate_hertz |
int64 Sampling frequency of the signal. |
ContainerAudio
Field |
Description |
container_audio_type |
enum ContainerAudioType
|
SynthesisInput
The input for synthesis.
Field |
Description |
text |
string The text string to be synthesized. |
ForceSynthesisEvent
Event to forcibly trigger synthesis.
Field |
Description |
Empty |
StreamSynthesisResponse
{
"audio_chunk": {
"data": "bytes"
},
"text_chunk": {
"text": "string"
},
"start_ms": "int64",
"length_ms": "int64"
}
Field |
Description |
audio_chunk |
Part of synthesized audio. |
text_chunk |
Part of synthesized text. |
start_ms |
int64 Start time of the audio chunk in milliseconds. |
length_ms |
int64 Length of the audio chunk in milliseconds. |
AudioChunk
Field |
Description |
data |
bytes Sequence of bytes of the synthesized audio in format specified in output_audio_spec. |
TextChunk
Field |
Description |
text |
string Synthesized text. |