Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex SpeechKit
  • SpeechKit technology overview
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
    • Recognition releases
    • Synthesis releases
    • Release archive

In this article:

  • Release as of 28/03/25
  • Release as of 24/01/25
  • Release as of 18/11/24
  • Release as of 10/10/24
  • Release as of 20/09/24
  • Release as of 09/09/24
  • Release as of 11/07/24
  • Release as of 15/04/24
  • Release as of 09/04/24
  • Release as of 03/04/24
  • Release as of 20/02/24
  • Release as of 06/02/24
  • Release as of 10/01/24
  • Release as of 05/12/23
  • Release as of 23/10/23
  • Release as of 27/07/23
  • Release as of 19/06/23
  • Release as of 08/06/23
  • Release as of 18/04/23
  • Release as of 21/03/23
  • Release as of 07/03/23
  • Release as of 07/10/22
  • Release as of 09/06/22
  • Release as of 19/05/22
  • Release as of 30/03/22
  • Release as of 17/03/22
  • Release as of 24/01/22
  • Release as of 16/12/21
  • Release as of 18/11/21
  1. Release notes
  2. Synthesis releases

Yandex SpeechKit release notes: Speech synthesis

Written by
Yandex Cloud
Updated at April 24, 2025
  • Release as of 28/03/25
  • Release as of 24/01/25
  • Release as of 18/11/24
  • Release as of 10/10/24
  • Release as of 20/09/24
  • Release as of 09/09/24
  • Release as of 11/07/24
  • Release as of 15/04/24
  • Release as of 09/04/24
  • Release as of 03/04/24
  • Release as of 20/02/24
  • Release as of 06/02/24
  • Release as of 10/01/24
  • Release as of 05/12/23
  • Release as of 23/10/23
  • Release as of 27/07/23
  • Release as of 19/06/23
  • Release as of 08/06/23
  • Release as of 18/04/23
  • Release as of 21/03/23
  • Release as of 07/03/23
  • Release as of 07/10/22
  • Release as of 09/06/22
  • Release as of 19/05/22
  • Release as of 30/03/22
  • Release as of 17/03/22
  • Release as of 24/01/22
  • Release as of 16/12/21
  • Release as of 18/11/21

SpeechKit provides updates based on the system model and version.

For more information about voice models, see About technology.

Release as of 28/03/25

Added Russian equivalents for the lola and yulduz voices, lola_ru and yulduz_ru, with multiple roles. Additional roles are now available for the following voices: saule, saule_ru, zhanar, zhanar_ru, and yulduz. For the full list of available voices, see List of voices.

Release as of 24/01/25

Added new voices. For synthesis in Kazakh, the zhanar female voice is now available. For synthesis in Uzbek, added the female voices lola and yulduz.

Release as of 18/11/24

Fixed the pronunciation of tenge for synthesis in Russian. Now the model pronounces it with a soft (palatalized) t: [tʲɪnˈɡʲe].`

Release as of 10/10/24

  1. Now there is a new female voice for synthesis in Kazakh (saule) and its Russian counterpart (saule_ru).
  2. The madirus Russian voice was renamed to madi_ru. The voice is still available by its old name, but please use the new one where possible.

Release as of 20/09/24

  • A quality update for the filipp, ermil, and zahar voices.
  • Optimized the normalizer for the Kazakh and Uzbek languages.

Release as of 09/09/24

Improved question intonation and overall synthesis quality for all publicly available Russian voices.

Release as of 11/07/24

For synthesis in Russian:

  • Reduced ambient noise level.
  • Fixed accentuation in certain words.

Release as of 15/04/24

Fixed a bug of synthesizing speech at too fast a rate.

Release as of 09/04/24

In API v1, marina is now the default voice.

Release as of 03/04/24

Changed the default voice in the API v3. All synthesis projects without an explicitly specified voice will now use the marina voice.

Release as of 20/02/24

A quality update for the masha, marina, anton, alexander, dasha, and julia voices.

Release as of 06/02/24

Added the REST API v3 support.

Release as of 10/01/24

  1. Added support for cardinal number normalization (English). Normalization works for positive integers only. Ordinal numbers are not supported.
  2. Added DurationHint to the API which you can use to specify minimum and maximum time spent on synthesizing the text.
  3. Added the text_chunk, start_ms, and length_ms fields to the UtteranceSynthesisResponse message. These fields store the info on the text, as well as the start and end time of the audio that came with the fragment.

Release as of 05/12/23

Improved the quality of speech synthesis for all languages except Russian.

Release as of 23/10/23

  1. A new voice, masha, is now available in three roles.
  2. Additional roles are now available for Russian-language voices.
  3. Optimized the normalizer for the Kazakh language.
  4. Improved the pronunciation quality of "SMS" for Kazakh and Uzbek.

Release as of 27/07/23

  1. Added the pitch_shift parameter to API v3. You can use it to increase the pitch contour of an entire synthesized audio by a fixed value in Hz. Shifting the contour makes a voice sound more lively.
  2. Seven new voices are now available for speech synthesis in Russian: dasha, julia, lera, marina, alexander, kirill, and anton.

Release as of 19/06/23

Improved the quality of pronunciation of car brands for Uzbek.

Release as of 08/06/23

  1. Added normalization for cardinal numbers written in Arabic numerals for Uzbek.
  2. Improved the quality of speech synthesis for Uzbek. The changes primarily enhance the synthesis of short texts.

Release as of 18/04/23

  1. Speech synthesis for Uzbek now supports phoneme-based format to transcribe text (see the list of supported phonemes here). In addition, the Uzbek model can now automatically replace apostrophes. However, for efficient speech synthesis, you should only use the straight (ʼ) and reversed (ʻ) typographic apostrophes.
  2. For pattern-based synthesis, the default volume normalization has been changed. Now, if the normalization type is not set explicitly, the volume of variables is normalized using the initial pattern.

Release as of 21/03/23

  1. A normalizer has been added for the Kazakh language. Now the model can pronounce numbers written in Arabic numerals.

  2. Added two types of apostrophes for Uzbek: straight typographic apostrophe (ʼ) and reversed typographic apostrophe (ʻ). Now you can synthesize phrases in Uzbek written in Latin script with these apostrophes.

    Yaʼni mana shu beret kiygan notanish odamni.
    Soʻng yana pastga qarab ketiladi.

    Warning

    Use only these options for apostrophes. The model does not support automatic replace, and the synthesis quality strongly depends on the input quality.

Release as of 07/03/23

  1. Significantly revised the SpeechKit Brand Voice technology for creating custom voices.
  2. Added support for pauses in all languages in test mode when using TTS markup. Please report any pausing errors by submitting a request to the support team. Your feedback will help us improve the functionality in future releases.

Release as of 07/10/22

The general branch has new voices and languages available for testing:

  • lea female voice: German.
  • madi male voice: Kazakh.
  • madirus male voice: Russian.
  • nigora female voice: Uzbek.

The general branch now has these new voices: amira and john.

Release as of 09/06/22

  1. Intonations and emphasis have been improved in all voices.

  2. More pausing features were added:

    • The error when pauses shorter than 1200 milliseconds were not taken into account in SSML markup has been fixed. Note that pauses shorter than 700 milliseconds are considered a synthesis cue and do not allow accurate control of the duration of a pause between words.
    • SSML pauses with the x-weak, weak, and medium values have a greater impact on the synthesized text.
    • You can now apply pauses when using TTS markup. Use the <[small]> tag to set the pause length in the synthesized text, e.g., Hello, <[small]>. The possible pause lengths are: tiny, small, medium, large, or huge.
  3. Support for filipp:deprecated was discontinued. filipp:deprecated and filipp now sound the same.

Release as of 19/05/22

  1. Deprecated voices will no longer be supported starting May 31, 2022.

  2. The rc branch has new voices and languages available for testing:

    • amira female voice: Kazakh.
    • john male voice: English.

    The voices are only available in API v3 and require the x-service-branch:rc header.

Release as of 30/03/22

  1. The standard voices are currently only available through the :deprecated tag and will be supported until May 31, 2022.

  2. Intonations and issues with rare artifacts in texts with many numbers were fixed when reported technical support (ticket CLOUDSUPPORT-138703).

Release as of 17/03/22

  1. Added the ability to synthesize audio files in MP3 format. This feature is available in API v3 and when using premium voices in API v1.

  2. New voices now support roles (extended emotional tones). See the emotion parameter in API v1 and role in API v3 for details. Different roles are available for different voices. For a complete list of values, see List of voices. If an incorrect role is selected, the service will return an error.

  3. Fixed the emphasis placement quality regression issue for the alena and filipp voices. Improved emphasis placement and subjective perception for all voices.

  4. Started a major update of standard voices: oksana, ermil, jane, omazh, and zahar will be replaced with oksana:rc, ermil:rc, jane:rc, omazh:rc, and zahar:rc, respectively. The update will not affect the cost of the regular voices. The existing oksana, ermil, jane, omazh, and zahar voices are available in the :deprecated branch.

Release as of 24/01/22

  1. Updated the generative model. The new version improves the way numbers and abbreviations from the finance domain are pronounced.

  2. You can now emphasize using markup: Are you **happy** to see me?

  3. Processing of SSML pauses and SIL tags has been made consistent to support integration with Yandex.Dialogs. Pauses in text in SSML or SIL notation are considered the end of an utterance, causing intonation representing the end of an utterance to replace the tag in the generated text. SSML pauses and SIL tags are supported when generating both short and long speech segments.

Release as of 16/12/21

  1. Limits for API v3 requests have been increased: length of a synthesized phrase is 250 characters or 24 seconds of audio. Important: request costs remain unchanged for the time being but may increase.

  2. The unsafe_mode option in API v3 enables you to automatically split long segments of text submitted for synthesis into separate phrases.

  3. The silence after the last word is synthesized is much shorter. Now, the audio ends almost immediately after the final word is synthesized.

Release as of 18/11/21

  1. Introduced some stabilizing fixes for the alena premium voice. It now sounds consistent.
  2. Fixed pronunciation errors for alena.
  3. Pausing in REST API has been improved.
  4. New premium voices have been added in test mode:
    • oksana:rc
    • ermil:rc
    • jane:rc
    • omazh:rc
    • zahar:rc

Was the article helpful?

Previous
Recognition releases
Next
Release archive
Yandex project
© 2025 Yandex.Cloud LLC