Domain

Written by

Updated at January 28, 2026

class yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction
class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech
class yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream

class yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction

Text to Speech function for creating synthesis object which provides methods for invoking voice synthesizing.

__call__(*, loudness_normalization=Undefined, audio_format=Undefined, model=Undefined, voice=Undefined, role=Undefined, speed=Undefined, volume=Undefined, pitch_shift=Undefined, duration_ms=Undefined, duration_min_ms=Undefined, duration_max_ms=Undefined, single_chunk_mode=Undefined)

Creates TextToSpeech object with provides methods for voice synthesizing.

To learn more about parameters and their formats and possible values, refer to TTS documentation

Parameters

loudness_normalization (LoudnessNormalization | UnknownEnumValue[LoudnessNormalization] | str | int | Undefined) – Specifies type of loudness normalization. Default: LUFS.
audio_format (AudioFormat | UnknownEnumValue[AudioFormat] | str | int | Undefined) – Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header.
model (str | Undefined) – The name of the TTS model to use for synthesis. Currently should be empty. Do not use it.
voice (str | Undefined) – The voice to use for speech synthesis.
role (str | Undefined) – The role or speaking style. Can be used to specify pronunciation character for the speaker.
speed (float | Undefined) – Speed multiplier (default: 1.0).
volume (float | Undefined) – Volume adjustment: * For MAX_PEAK: range is (0, 1], default 0.7. * For LUFS: range is [-145, 0), default -19.
pitch_shift (float | Undefined) – Pitch adjustment, in Hz, range [-1000, 1000], default 0.
duration_ms (int | Undefined) – Limit audio duration to exact value.
duration_min_ms (int | Undefined) – Limit the minimum audio duration.
duration_max_ms (int | Undefined) – Limit the maximum audio duration
single_chunk_mode (bool | Undefined) – Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible

Return type

TextToSpeechTypeT

TTS object

class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech

Text to Speech class which provides concrete methods for working with SpeechKit TTS API and incapsulates sintesis setting.

run(input, *, timeout=60)

Run a speech synthesis for given text and return joined result.

To change initial search settings use .configure method:

>>> search = sdk.speechkit.text_to_speech(audio_format='mp3')
>>> search = search.configure(audio_format='WAV')

Parameters

text – Text to vocalize.
timeout (float) – Timeout, or the maximum time to wait for the request to complete in seconds.
input (str)

Returns

synthesis result; joined in case of >1 chunks in synthesis response.

run_stream(input, *, timeout=60)

Run a speech synthesis for given text at input; method have an iterator return.

To change initial search settings use .configure method:

>>> search = sdk.speechkit.text_to_speech(audio_format='mp3')
>>> search = search.configure(audio_format='WAV')

Parameters

text – Text to vocalize.
timeout (float) – Timeout, or the maximum time to wait for the request to complete in seconds.
input (str)

Returns

synthesis result; joined in case of >1 chunks in synthesis response.

Return type

Iterator[TextToSpeechResult]

class AudioFormat

classmethod Unknown(name, value)

Parameters

name (str)
value (int)

__new__(value)

conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

classmethod PCM16(sample_rate_hertz, channels=1)

Audio bit depth 16-bit signed little-endian (Linear PCM).

Parameters

sample_rate_hertz (int)
channels (int)

Return type

PCM16

MP3 = 3

Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format

WAV = 1

Audio bit depth 16-bit signed little-endian (Linear PCM) paked into WAV container format

OGG_OPUS = 2

Data is encoded using the OPUS audio codec and compressed using the OGG container format

__init__(*args, **kwds)

class LoudnessNormalization

classmethod Unknown(name, value)

Parameters

name (str)
value (int)

__new__(value)

conjugate()

Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

as_integer_ratio()

Return a pair of integers, whose ratio is equal to the original int.

The ratio is in lowest terms and has a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

is_integer()

Returns True. Exists for duck type compatibility with float.is_integer.

real

the real part of a complex number

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

denominator

the denominator of a rational number in lowest terms

MAX_PEAK = 1

The type of normalization, wherein the gain is changed to bring the highest PCM sample value or analog signal peak to a given level.

LUFS = 2

The type of normalization based on EBU R 128 recommendation

__init__(*args, **kwds)

__init__(*, sdk, uri, config=None, owner=None)

Parameters

sdk (yandex_ai_studio_sdk._sdk.BaseSDK)
uri (str)
config (ConfigTypeT | None)
owner (str | None)

property config: ConfigTypeT

configure(*, loudness_normalization=Undefined, audio_format=Undefined, model=Undefined, voice=Undefined, role=Undefined, speed=Undefined, volume=Undefined, pitch_shift=Undefined, duration_ms=Undefined, duration_min_ms=Undefined, duration_max_ms=Undefined, single_chunk_mode=Undefined)

Returns the new object with config fields overrode by passed values.

To return set value back to default, pass None value.

To learn more about parameters and their formats and possible values, refer to TTS documentation

Parameters

loudness_normalization (LoudnessNormalization | UnknownEnumValue[LoudnessNormalization] | str | int | Undefined | None) – Specifies type of loudness normalization. Default: LUFS.
audio_format (AudioFormat | UnknownEnumValue[AudioFormat] | str | int | Undefined | None) – Specifies output audio format. Default: 22050Hz, linear 16-bit signed little-endian PCM, with WAV header.
model (str | Undefined | None) – The name of the TTS model to use for synthesis. Currently should be empty. Do not use it.
voice (str | Undefined | None) – The voice to use for speech synthesis.
role (str | Undefined | None) – The role or speaking style. Can be used to specify pronunciation character for the speaker.
speed (float | Undefined | None) – Speed multiplier (default: 1.0).
volume (float | Undefined | None) – Volume adjustment: * For MAX_PEAK: range is (0, 1], default 0.7. * For LUFS: range is [-145, 0), default -19.
pitch_shift (float | Undefined | None) – Pitch adjustment, in Hz, range [-1000, 1000], default 0.
duration_ms (int | Undefined | None) – Limit audio duration to exact value.
duration_min_ms (int | Undefined | None) – Limit the minimum audio duration.
duration_max_ms (int | Undefined | None) – Limit the maximum audio duration.
single_chunk_mode (bool | Undefined | None) – Automatically split long text to several utterances and bill accordingly. Some degradation in service quality is possible

Return type

Self

create_bistream(*, timeout=600)

Creates a bidirectional stream object for using Yandex SpeechKit Streaming synthesis.

Parameters	timeout (float) – GRPC timeout in seconds that defines the maximum lifetime of the entire stream. The timeout countdown begins from the moment of the first stream interaction.
Return type	TTSBidirectionalStreamTypeT

property fine_tuned: bool | None

property name: str | None

property owner: str | None

property uri: str

property version: str | None

class yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream

Bidirectional SpeechKit TTS API which allows to write requests and read synthesized result in realtime

write(input)

Write a input to be synthesized

Parameters	input (str)
Return type	None

read()

Read chunk of synthesized result.

Returns None in case of closed stream.

Return type

TextToSpeechResult | None

gen()

Returns generator over all synthesized result parts.

Return type

Generator[TextToSpeechResult]

done_writing()

Close the stream to tell to a server you done writing.

Closing the stream will allow any iteration over this stream to exit.

It is very important to close the stream to properly release resources.

Return type

None

flush()

Send message to server to force synthesis with already given input

Return type

None

Domain

class yandexaistudiosdk.speechkit.texttospeech.function.TextToSpeechFunctionclass yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction

TTS object

class yandexaistudiosdk.speechkit.texttospeech.tts.TextToSpeechclass yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech

class AudioFormatclass AudioFormat

class LoudnessNormalizationclass LoudnessNormalization

class yandexaistudiosdk.speechkit.texttospeech.bistream.TTSBidirectionalStreamclass yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream

Was the article helpful?

class yandex_ai_studio_sdk._speechkit.text_to_speech.function.TextToSpeechFunction

class yandex_ai_studio_sdk._speechkit.text_to_speech.tts.TextToSpeech

class AudioFormat

class LoudnessNormalization

class yandex_ai_studio_sdk._speechkit.text_to_speech.bistream.TTSBidirectionalStream