Yandex Cloud
Поиск
Связаться с намиПопробовать бесплатно
  • Истории успеха
  • Документация
  • Блог
  • Все сервисы
  • Статус работы сервисов
  • Marketplace
    • Доступны в регионе
    • Инфраструктура и сеть
    • Платформа данных
    • Искусственный интеллект
    • Безопасность
    • Инструменты DevOps
    • Бессерверные вычисления
    • Управление ресурсами
  • Все решения
    • По отраслям
    • По типу задач
    • Экономика платформы
    • Безопасность
    • Техническая поддержка
    • Каталог партнёров
    • Обучение и сертификация
    • Облако для стартапов
    • Облако для крупного бизнеса
    • Центр технологий для общества
    • Партнёрская программа
    • Поддержка IT-бизнеса
    • Облако для фрилансеров
    • Обучение и сертификация
    • Блог
    • Документация
    • Мероприятия и вебинары
    • Контакты, чаты и сообщества
    • Идеи
    • Калькулятор цен
    • Тарифы
    • Акции и free tier
  • Истории успеха
  • Документация
  • Блог
Создавайте контент и получайте гранты!Готовы написать своё руководство? Участвуйте в контент-программе и получайте гранты на работу с облачными сервисами!
Подробнее о программе
Проект Яндекса
© 2026 ТОО «Облачные Сервисы Казахстан»
Yandex AI Studio
  • О сервисе Yandex AI Studio
  • Начало работы с Model Gallery
  • Yandex Workflows
  • Переход с AI Assistant API на Responses API
  • Совместимость с OpenAI
  • Квоты и лимиты
  • Правила тарификации
  • Управление доступом
  • Аудитные логи Audit Trails
  • Публичные материалы
  • История изменений
  • Термины и определения

В этой статье:

  • class yandex_ai_studio_sdk._speechkit.text_to_speech.function.AsyncTextToSpeechFunction
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.AsyncTextToSpeech
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.AsyncTTSBidirectionalStream

Domain

Статья создана
Yandex Cloud
Обновлена 28 января 2026 г.
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.function.AsyncTextToSpeechFunction
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.AsyncTextToSpeech
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.AsyncTTSBidirectionalStream

class yandexaistudiosdk.speechkit.texttospeech.function.AsyncTextToSpeechFunctionclass yandex_ai_studio_sdk._speechkit.text_to_speech.function.AsyncTextToSpeechFunction

Text to Speech function for creating synthesis object which provides methods for invoking voice synthesizing.

__call__(*, loudness_normalization=Undefined, audio_format=Undefined, model=Undefined, voice=Undefined, role=Undefined, speed=Undefined, volume=Undefined, pitch_shift=Undefined, duration_ms=Undefined, duration_min_ms=Undefined, duration_max_ms=Undefined, single_chunk_mode=Undefined)

Creates TextToSpeech object with provides methods for voice synthesizing.

To learn more about parameters and their formats and possible values, refer to TTS documentation

Parameters

  • loudness_normalization (LoudnessNormalization | UnknownEnumValue[LoudnessNormalization] | str | int | Undefined) – Specifies type of loudness normalization. Default: LUFS.
  • audio_format (AudioFormat | UnknownEnumValue[AudioFormat] | str | int | Undefined) – Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header.
  • model (str | Undefined) – The name of the TTS model to use for synthesis. Currently should be empty. Do not use it.
  • voice (str | Undefined) – The voice to use for speech synthesis.
  • role (str | Undefined) – The role or speaking style. Can be used to specify pronunciation character for the speaker.
  • speed (float | Undefined) – Speed multiplier (default: 1.0).
  • volume (float | Undefined) – Volume adjustment: * For MAX_PEAK: range is (0, 1], default 0.7. * For LUFS: range is [-145, 0), default -19.
  • pitch_shift (float | Undefined) – Pitch adjustment, in Hz, range [-1000, 1000], default 0.
  • duration_ms (int | Undefined) – Limit audio duration to exact value.
  • duration_min_ms (int | Undefined) – Limit the minimum audio duration.
  • duration_max_ms (int | Undefined) – Limit the maximum audio duration
  • single_chunk_mode (bool | Undefined) – Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible

Return type

TextToSpeechTypeT

TTS object

class yandexaistudiosdk.speechkit.texttospeech.tts.AsyncTextToSpeechclass yandex_ai_studio_sdk._speechkit.text_to_speech.tts.AsyncTextToSpeech

async run(input, *, timeout=60)

Run a speech synthesis for given text and return joined result.

To change initial search settings use .configure method:

>>> search = sdk.speechkit.text_to_speech(audio_format='mp3')
>>> search = search.configure(audio_format='WAV')

Parameters

  • text – Text to vocalize.
  • timeout (float) – Timeout, or the maximum time to wait for the request to complete in seconds.
  • input (str)

Returns

synthesis result; joined in case of >1 chunks in synthesis response.

Return type

TextToSpeechResult

async run_stream(input, *, timeout=60)

Parameters

  • input (str)
  • timeout (float)

Return type

AsyncIterator[TextToSpeechResult]

class AudioFormatclass AudioFormat

classmethod Unknown(name, value)

Parameters

  • name (str)
  • value (int)

__new__(value)

conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

classmethod PCM16(sample_rate_hertz, channels=1)

Audio bit depth 16-bit signed little-endian (Linear PCM).

Parameters

  • sample_rate_hertz (int)
  • channels (int)

Return type

PCM16

MP3 = 3

Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format

WAV = 1

Audio bit depth 16-bit signed little-endian (Linear PCM) paked into WAV container format

OGG_OPUS = 2

Data is encoded using the OPUS audio codec and compressed using the OGG container format

__init__(*args, **kwds)

class LoudnessNormalizationclass LoudnessNormalization

classmethod Unknown(name, value)

Parameters

  • name (str)
  • value (int)

__new__(value)

conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

MAX_PEAK = 1

The type of normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to a given level.

LUFS = 2

The type of normalization based on EBU R 128 recommendation

__init__(*args, **kwds)

__init__(*, sdk, uri, config=None, owner=None)

Parameters

  • sdk (yandex_ai_studio_sdk._sdk.BaseSDK)
  • uri (str)
  • config (ConfigTypeT | None)
  • owner (str | None)

property config: ConfigTypeT

configure(*, loudness_normalization=Undefined, audio_format=Undefined, model=Undefined, voice=Undefined, role=Undefined, speed=Undefined, volume=Undefined, pitch_shift=Undefined, duration_ms=Undefined, duration_min_ms=Undefined, duration_max_ms=Undefined, single_chunk_mode=Undefined)

Returns the new object with config fields overrode by passed values.

To return set value back to default, pass None value.

To learn more about parameters and their formats and possible values, refer to TTS documentation

Parameters

  • loudness_normalization (LoudnessNormalization | UnknownEnumValue[LoudnessNormalization] | str | int | Undefined | None) – Specifies type of loudness normalization. Default: LUFS.
  • audio_format (AudioFormat | UnknownEnumValue[AudioFormat] | str | int | Undefined | None) – Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header.
  • model (str | Undefined | None) – The name of the TTS model to use for synthesis. Currently should be empty. Do not use it.
  • voice (str | Undefined | None) – The voice to use for speech synthesis.
  • role (str | Undefined | None) – The role or speaking style. Can be used to specify pronunciation character for the speaker.
  • speed (float | Undefined | None) – Speed multiplier (default: 1.0).
  • volume (float | Undefined | None) – Volume adjustment: * For MAX_PEAK: range is (0, 1], default 0.7. * For LUFS: range is [-145, 0), default -19.
  • pitch_shift (float | Undefined | None) – Pitch adjustment, in Hz, range [-1000, 1000], default 0.
  • duration_ms (int | Undefined | None) – Limit audio duration to exact value.
  • duration_min_ms (int | Undefined | None) – Limit the minimum audio duration.
  • duration_max_ms (int | Undefined | None) – Limit the maximum audio duration.
  • single_chunk_mode (bool | Undefined | None) – Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible

Return type

Self

create_bistream(*, timeout=600)

Creates a bidirectional stream object for using Yandex SpeechKit Streaming synthesis.

Parameters

timeout (float) – GRPC timeout in seconds that defines the maximum lifetime of the entire stream. The timeout countdown begins from the moment of the first stream interaction.

Return type

TTSBidirectionalStreamTypeT

property fine_tuned: bool | None

property name: str | None

property owner: str | None

property uri: str

property version: str | None

class yandexaistudiosdk.speechkit.texttospeech.tts.AsyncTTSBidirectionalStreamclass yandex_ai_studio_sdk._speechkit.text_to_speech.tts.AsyncTTSBidirectionalStream

Bidirectional SpeechKit TTS API which allows to write requests and read synthesized result in realtime

async write(input)

Write a input to be synthesized

Parameters

input (str)

Return type

None

async read()

Read chunk of synthesized result.

Returns None in case of closed stream.

Return type

TextToSpeechResult | None

async gen()

Returns generator over all synthesized result parts.

Return type

AsyncGenerator[TextToSpeechResult]

async done_writing()

Close the stream to tell to a server you done writing.

Closing the stream will allow any iteration over this stream to exit.

It is very important to close the stream to properly release resources.

Return type

None

async flush()

Send message to server to force synthesis with already given input

Return type

None

Была ли статья полезна?

Создавайте контент и получайте гранты!Готовы написать своё руководство? Участвуйте в контент-программе и получайте гранты на работу с облачными сервисами!
Подробнее о программе
Проект Яндекса
© 2026 ТОО «Облачные Сервисы Казахстан»