Yandex Cloud
Поиск
Связаться с намиПодключиться
  • Истории успеха
  • Документация
  • Блог
  • Все сервисы
  • Статус работы сервисов
    • Доступны в регионе
    • Инфраструктура и сеть
    • Платформа данных
    • Контейнеры
    • Инструменты разработчика
    • Бессерверные вычисления
    • Безопасность
    • Мониторинг и управление ресурсами
    • ИИ для бизнеса
    • Бизнес-инструменты
  • Все решения
    • По отраслям
    • По типу задач
    • Экономика платформы
    • Безопасность
    • Техническая поддержка
    • Каталог партнёров
    • Обучение и сертификация
    • Облако для стартапов
    • Облако для крупного бизнеса
    • Центр технологий для общества
    • Партнёрская программа
    • Поддержка IT-бизнеса
    • Облако для фрилансеров
    • Обучение и сертификация
    • Блог
    • Документация
    • Мероприятия и вебинары
    • Контакты, чаты и сообщества
    • Идеи
    • Тарифы Yandex Cloud
    • Промоакции и free tier
    • Правила тарификации
  • Истории успеха
  • Документация
  • Блог
Проект Яндекса
© 2025 ТОО «Облачные Сервисы Казахстан»
Yandex SpeechKit Hybrid
  • Системные требования
  • Архитектура сервиса
  • Аутентификация в API
      • Overview
      • Synthesizer
  • Правила тарификации
  • Релизы SpeechKit Hybrid

В этой статье:

  • Calls Synthesizer
  • UtteranceSynthesis
  • UtteranceSynthesisRequest
  • TextTemplate
  • TextVariable
  • Hints
  • AudioTemplate
  • AudioContent
  • AudioVariable
  • DurationHint
  • AudioFormatOptions
  • RawAudio
  • ContainerAudio
  • UtteranceSynthesisResponse
  • AudioChunk
  • TextChunk
  1. Справочник API gRPC (англ.)
  2. Синтез речи
  3. Synthesizer

SpeechKit Hybrid Synthesis Service API, gRPC: Synthesizer

Статья создана
Yandex Cloud
Обновлена 16 января 2024 г.
  • Calls Synthesizer
  • UtteranceSynthesis
    • UtteranceSynthesisRequest
    • TextTemplate
    • TextVariable
    • Hints
    • AudioTemplate
    • AudioContent
    • AudioVariable
    • DurationHint
    • AudioFormatOptions
    • RawAudio
    • ContainerAudio
    • UtteranceSynthesisResponse
    • AudioChunk
    • TextChunk

A set of methods for voice synthesis.

Call Description
UtteranceSynthesis Synthesizing text into speech.

Calls SynthesizerCalls Synthesizer

UtteranceSynthesisUtteranceSynthesis

Synthesizing text into speech.

rpc UtteranceSynthesis (UtteranceSynthesisRequest) returns (stream UtteranceSynthesisResponse)

UtteranceSynthesisRequestUtteranceSynthesisRequest

Field Description
model string
The name of the model. Specifies basic synthesis functionality. Currently should be empty. Do not use it.
Utterance oneof: text or text_template
Text to synthesis, one of text synthesis markups.
  text string
Raw text (e.g. "Hello, Alice").
  text_template TextTemplate
Text template instance, e.g. {"Hello, {username}" with username="Alice"}.
hints[] Hints
Optional hints for synthesis.
output_audio_spec AudioFormatOptions
Optional. Default: 22050 Hz, linear 16-bit signed little-endian PCM, with WAV header
loudness_normalization_type enum LoudnessNormalizationType
Specifies type of loudness normalization. Optional. Default: LUFS.
  • MAX_PEAK: The type of normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to a given level.
  • LUFS: The type of normalization based on EBU R 128 recommendation.
unsafe_mode bool
Optional. Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible.

TextTemplateTextTemplate

Field Description
text_template string
Template text.
Sample:The {animal} goes to the {place}.
variables[] TextVariable
Defining variables in template text.
Sample: {animal: cat, place: forest}

TextVariableTextVariable

Field Description
variable_name string
The name of the variable.
variable_value string
The text of the variable.

HintsHints

Field Description
Hint oneof: voice, audio_template, speed, volume, role, pitch_shift or duration
The hint for TTS engine to specify synthesised audio characteristics.
  voice string
Name of speaker to use.
  audio_template AudioTemplate
Template for synthesizing.
  speed double
Hint to change speed.
  volume double
Hint to regulate normalization level.
  • For MAX_PEAK loudness_normalization_type: volume changes in a range (0;1], default value is 0.7.
  • For LUFS loudness_normalization_type: volume changes in a range [-145;0), default value is -19.
  role string
Hint to specify pronunciation character for the speaker.
  pitch_shift double
Hint to increase (or decrease) speaker's pitch, measured in Hz. Valid values are in range [-1000;1000], default value is 0.
  duration DurationHint
Hint to limit both minimum and maximum audio duration.

AudioTemplateAudioTemplate

Field Description
audio AudioContent
Audio file.
text_template TextTemplate
Template and description of its variables.
variables[] AudioVariable
Describing variables in audio.

AudioContentAudioContent

Field Description
AudioSource oneof: content
The audio source to read the data from.
  content bytes
Bytes with audio data.
audio_spec AudioFormatOptions
Description of the audio format.

AudioVariableAudioVariable

Field Description
variable_name string
The name of the variable.
variable_start_ms int64
Start time of the variable in milliseconds.
variable_length_ms int64
Length of the variable in milliseconds.

DurationHintDurationHint

Field Description
policy enum DurationHintPolicy
Type of duration constraint.
  • EXACT_DURATION: Limit audio duration to exact value.
  • MIN_DURATION: Limit the minimum audio duration.
  • MAX_DURATION: Limit the maximum audio duration.
duration_ms int64
Constraint on audio duration in milliseconds.

AudioFormatOptionsAudioFormatOptions

Field Description
AudioFormat oneof: raw_audio or container_audio
  raw_audio RawAudio
The audio format specified in request parameters.
  container_audio ContainerAudio
The audio format specified inside the container metadata.

RawAudioRawAudio

Field Description
audio_encoding enum AudioEncoding
Encoding type.
  • LINEAR16_PCM: Audio bit depth 16-bit signed little-endian (Linear PCM).
sample_rate_hertz int64
Sampling frequency of the signal.

ContainerAudioContainerAudio

Field Description
container_audio_type enum ContainerAudioType
  • WAV: Audio bit depth 16-bit signed little-endian (Linear PCM).
  • OGG_OPUS: Data is encoded using the OPUS audio codec and compressed using the OGG container format.
  • MP3: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

UtteranceSynthesisResponseUtteranceSynthesisResponse

Field Description
audio_chunk AudioChunk
Part of synthesized audio.
text_chunk TextChunk
Part of synthesized text.
start_ms int64
Start time of the audio chunk in milliseconds.
length_ms int64
Length of the audio chunk in milliseconds.

AudioChunkAudioChunk

Field Description
data bytes
Sequence of bytes of the synthesized audio in format specified in output_audio_spec.

TextChunkTextChunk

Field Description
text string
Synthesized text.

Была ли статья полезна?

Предыдущая
Overview
Следующая
Overview
Проект Яндекса
© 2025 ТОО «Облачные Сервисы Казахстан»