Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex AI Studio
  • About Yandex AI Studio
  • Getting started with Model Gallery
  • Yandex Workflows
  • Switching from the AI Assistant API to Responses API
  • Compatibility with OpenAI
  • Quotas and limits
  • Pricing policy
  • Access management
  • Audit Trails events
  • Public materials
  • Release notes
  • Terms and definitions

In this article:

  • class yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream

Domain

Written by
Yandex Cloud
Updated at January 28, 2026
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech
  • class yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream

class yandexaistudiosdk.speechkit.texttospeech.function.TextToSpeechFunctionclass yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction

Text to Speech function for creating synthesis object which provides methods for invoking voice synthesizing.

__call__(*, loudness_normalization=Undefined, audio_format=Undefined, model=Undefined, voice=Undefined, role=Undefined, speed=Undefined, volume=Undefined, pitch_shift=Undefined, duration_ms=Undefined, duration_min_ms=Undefined, duration_max_ms=Undefined, single_chunk_mode=Undefined)

Creates TextToSpeech object with provides methods for voice synthesizing.

To learn more about parameters and their formats and possible values, refer to TTS documentation

Parameters

  • loudness_normalization (LoudnessNormalization | UnknownEnumValue[LoudnessNormalization] | str | int | Undefined) – Specifies type of loudness normalization. Default: LUFS.
  • audio_format (AudioFormat | UnknownEnumValue[AudioFormat] | str | int | Undefined) – Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header.
  • model (str | Undefined) – The name of the TTS model to use for synthesis. Currently should be empty. Do not use it.
  • voice (str | Undefined) – The voice to use for speech synthesis.
  • role (str | Undefined) – The role or speaking style. Can be used to specify pronunciation character for the speaker.
  • speed (float | Undefined) – Speed multiplier (default: 1.0).
  • volume (float | Undefined) – Volume adjustment: * For MAX_PEAK: range is (0, 1], default 0.7. * For LUFS: range is [-145, 0), default -19.
  • pitch_shift (float | Undefined) – Pitch adjustment, in Hz, range [-1000, 1000], default 0.
  • duration_ms (int | Undefined) – Limit audio duration to exact value.
  • duration_min_ms (int | Undefined) – Limit the minimum audio duration.
  • duration_max_ms (int | Undefined) – Limit the maximum audio duration
  • single_chunk_mode (bool | Undefined) – Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible

Return type

TextToSpeechTypeT

TTS object

class yandexaistudiosdk.speechkit.texttospeech.tts.TextToSpeechclass yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech

Text to Speech class which provides concrete methods for working with SpeechKit TTS API and incapsulates sintesis setting.

run(input, *, timeout=60)

Run a speech synthesis for given text and return joined result.

To change initial search settings use .configure method:

>>> search = sdk.speechkit.text_to_speech(audio_format='mp3')
>>> search = search.configure(audio_format='WAV')

Parameters

  • text – Text to vocalize.
  • timeout (float) – Timeout, or the maximum time to wait for the request to complete in seconds.
  • input (str)

Returns

synthesis result; joined in case of >1 chunks in synthesis response.

run_stream(input, *, timeout=60)

Run a speech synthesis for given text at input; method have an iterator return.

To change initial search settings use .configure method:

>>> search = sdk.speechkit.text_to_speech(audio_format='mp3')
>>> search = search.configure(audio_format='WAV')

Parameters

  • text – Text to vocalize.
  • timeout (float) – Timeout, or the maximum time to wait for the request to complete in seconds.
  • input (str)

Returns

synthesis result; joined in case of >1 chunks in synthesis response.

Return type

Iterator[TextToSpeechResult]

class AudioFormatclass AudioFormat

classmethod Unknown(name, value)

Parameters

  • name (str)
  • value (int)

__new__(value)

conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

classmethod PCM16(sample_rate_hertz, channels=1)

Audio bit depth 16-bit signed little-endian (Linear PCM).

Parameters

  • sample_rate_hertz (int)
  • channels (int)

Return type

PCM16

MP3 = 3

Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format

WAV = 1

Audio bit depth 16-bit signed little-endian (Linear PCM) paked into WAV container format

OGG_OPUS = 2

Data is encoded using the OPUS audio codec and compressed using the OGG container format

__init__(*args, **kwds)

class LoudnessNormalizationclass LoudnessNormalization

classmethod Unknown(name, value)

Parameters

  • name (str)
  • value (int)

__new__(value)

conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

MAX_PEAK = 1

The type of normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to a given level.

LUFS = 2

The type of normalization based on EBU R 128 recommendation

__init__(*args, **kwds)

__init__(*, sdk, uri, config=None, owner=None)

Parameters

  • sdk (yandex_ai_studio_sdk._sdk.BaseSDK)
  • uri (str)
  • config (ConfigTypeT | None)
  • owner (str | None)

property config: ConfigTypeT

configure(*, loudness_normalization=Undefined, audio_format=Undefined, model=Undefined, voice=Undefined, role=Undefined, speed=Undefined, volume=Undefined, pitch_shift=Undefined, duration_ms=Undefined, duration_min_ms=Undefined, duration_max_ms=Undefined, single_chunk_mode=Undefined)

Returns the new object with config fields overrode by passed values.

To return set value back to default, pass None value.

To learn more about parameters and their formats and possible values, refer to TTS documentation

Parameters

  • loudness_normalization (LoudnessNormalization | UnknownEnumValue[LoudnessNormalization] | str | int | Undefined | None) – Specifies type of loudness normalization. Default: LUFS.
  • audio_format (AudioFormat | UnknownEnumValue[AudioFormat] | str | int | Undefined | None) – Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header.
  • model (str | Undefined | None) – The name of the TTS model to use for synthesis. Currently should be empty. Do not use it.
  • voice (str | Undefined | None) – The voice to use for speech synthesis.
  • role (str | Undefined | None) – The role or speaking style. Can be used to specify pronunciation character for the speaker.
  • speed (float | Undefined | None) – Speed multiplier (default: 1.0).
  • volume (float | Undefined | None) – Volume adjustment: * For MAX_PEAK: range is (0, 1], default 0.7. * For LUFS: range is [-145, 0), default -19.
  • pitch_shift (float | Undefined | None) – Pitch adjustment, in Hz, range [-1000, 1000], default 0.
  • duration_ms (int | Undefined | None) – Limit audio duration to exact value.
  • duration_min_ms (int | Undefined | None) – Limit the minimum audio duration.
  • duration_max_ms (int | Undefined | None) – Limit the maximum audio duration.
  • single_chunk_mode (bool | Undefined | None) – Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible

Return type

Self

create_bistream(*, timeout=600)

Creates a bidirectional stream object for using Yandex SpeechKit Streaming synthesis.

Parameters

timeout (float) – GRPC timeout in seconds that defines the maximum lifetime of the entire stream. The timeout countdown begins from the moment of the first stream interaction.

Return type

TTSBidirectionalStreamTypeT

property fine_tuned: bool | None

property name: str | None

property owner: str | None

property uri: str

property version: str | None

class yandexaistudiosdk.speechkit.texttospeech.bistream.TTSBidirectionalStreamclass yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream

Bidirectional SpeechKit TTS API which allows to write requests and read synthesized result in realtime

write(input)

Write a input to be synthesized

Parameters

input (str)

Return type

None

read()

Read chunk of synthesized result.

Returns None in case of closed stream.

Return type

TextToSpeechResult | None

gen()

Returns generator over all synthesized result parts.

Return type

Generator[TextToSpeechResult]

done_writing()

Close the stream to tell to a server you done writing.

Closing the stream will allow any iteration over this stream to exit.

It is very important to close the stream to properly release resources.

Return type

None

flush()

Send message to server to force synthesis with already given input

Return type

None

Was the article helpful?

© 2026 Direct Cursus Technology L.L.C.