Yandex SpeechKit release notes: Speech synthesis
SpeechKit provides updates based on the system model and version.
For more information about voice models, see About technology.
Release as of 20/09/24
- A quality update for the
filipp
,ermil
, andzahar
voices. - Optimized the normalizer for the Kazakh and Uzbek languages.
Release as of 09/09/24
Improved question intonation and overall synthesis quality for all publicly available Russian voices.
Release as of 11/07/24
For synthesis in Russian:
- Reduced ambient noise level.
- Fixed accentuation in certain words.
Release as of 15/04/24
Fixed a bug of synthesizing speech at too fast a rate.
Release as of 09/04/24
In API v1, marina
is now the default voice.
Release as of 03/04/24
Changed the default voice in the API v3. All synthesis projects without an explicitly specified voice will now use the marina
voice.
Release as of 20/02/24
A quality update for the masha
, marina
, anton
, alexander
, dasha
, and julia
voices.
Release as of 06/02/24
Added the REST API v3 support.
Release as of 10/01/24
- Added support for cardinal number normalization (English). Normalization works for positive integers only. Ordinal numbers are not supported.
- Added DurationHint to the API which you can use to specify minimum and maximum time spent on synthesizing the text.
- Added the
text_chunk
,start_ms
, andlength_ms
fields to the UtteranceSynthesisResponse message. These fields store the info on the text, as well as the start and end time of the audio that came with the fragment.
Release as of 05/12/23
Improved the quality of speech synthesis for all languages except Russian.
Release as of 23/10/23
- A new voice,
masha
, is now available in three roles. - Additional roles are now available for Russian-language voices.
- Optimized the normalizer for the Kazakh language.
- Improved the pronunciation quality of "SMS" for Kazakh and Uzbek.
Release as of 27/07/23
- Added the
pitch_shift
parameter to API v3. You can use it to increase the pitch contour of an entire synthesized audio by a fixed value in Hz. Shifting the contour makes a voice sound more lively. - Seven new voices are now available for speech synthesis in Russian:
dasha
,julia
,lera
,marina
,alexander
,kirill
, andanton
.
Release as of 19/06/23
Improved the quality of pronunciation of car brands for Uzbek.
Release as of 08/06/23
- Added normalization for cardinal numbers written in Arabic numerals for Uzbek.
- Improved the quality of speech synthesis for Uzbek. The changes primarily enhance the synthesis of short texts.
Release as of 18/04/23
- Speech synthesis for Uzbek now supports phoneme-based format to transcribe text (see the list of supported phonemes here). In addition, the Uzbek model can now automatically replace apostrophes. However, for efficient speech synthesis, you should only use the straight (
ʼ
) and reversed (ʻ
) typographic apostrophes. - For pattern-based synthesis, the default volume normalization has been changed. Now, if the normalization type is not set explicitly, the volume of variables is normalized using the initial pattern.
Release as of 21/03/23
-
A normalizer has been added for the Kazakh language. Now the model can pronounce numbers written in Arabic numerals.
-
Added two types of apostrophes for Uzbek: straight typographic apostrophe (
ʼ
) and reversed typographic apostrophe (ʻ
). Now you can synthesize phrases in Uzbek written in Latin script with these apostrophes.Yaʼni mana shu beret kiygan notanish odamni.
Soʻng yana pastga qarab ketiladi.Warning
Use only these options for apostrophes. The model does not support automatic replace, and the synthesis quality strongly depends on the input quality.
Release as of 07/03/23
- Significantly revised the SpeechKit Brand Voice technology for creating custom voices.
- Added support for pauses in all languages in test mode when using TTS markup. Please report any pausing errors by submitting a request to the support team. Your feedback will help us improve the functionality in future releases.
Release as of 07/10/22
The general
branch has new voices and languages available for testing:
lea
female voice: German.madi
male voice: Kazakh.madirus
male voice: Russian.nigora
female voice: Uzbek.
The general
branch now has these new voices: amira
and john
.
Release as of 09/06/22
-
Intonations and emphasis have been improved in all voices.
-
More pausing features were added:
- The error when pauses shorter than 1200 milliseconds were not taken into account in SSML markup has been fixed. Note that pauses shorter than 700 milliseconds are considered a synthesis cue and do not allow accurate control of the duration of a pause between words.
- SSML pauses with the
x-weak
,weak
, andmedium
values have a greater impact on the synthesized text. - You can now apply pauses when using TTS markup. Use the
<[small]>
tag to set the pause length in the synthesized text, e.g.,Hello, <[small]>
. The possible pause lengths are:tiny
,small
,medium
,large
, orhuge
.
-
Support for
filipp:deprecated
was discontinued.filipp:deprecated
andfilipp
now sound the same.
Release as of 19/05/22
-
Deprecated voices will no longer be supported starting May 31, 2022.
-
The
rc
branch has new voices and languages available for testing:amira
female voice: Kazakh.john
male voice: English.
The voices are only available in API v3 and require the
x-service-branch:rc
header.
Release as of 30/03/22
-
The standard voices are currently only available through the
:deprecated
tag and will be supported until May 31, 2022. -
Intonations and issues with rare artifacts in texts with many numbers have been fixed following a technical support request (issue CLOUDSUPPORT-138703).
Release as of 17/03/22
-
Added the ability to synthesize audio files in MP3 format. This feature is available in API v3 and when using premium voices in API v1.
-
New voices now support roles (extended emotional tones). See the
emotion
parameter in API v1 androle
in API v3 for details. Different roles are available for different voices. For a complete list of values, see List of voices. If an incorrect role is selected, the service will return an error. -
Fixed the emphasis placement quality regression issue for the
alena
andfilipp
voices. Improved emphasis placement and subjective perception for all voices. -
Started a major update of standard voices:
oksana
,ermil
,jane
,omazh
, andzahar
will be replaced withoksana:rc
,ermil:rc
,jane:rc
,omazh:rc
, andzahar:rc
, respectively. The update will not affect the cost of the regular voices. The existingoksana
,ermil
,jane
,omazh
, andzahar
voices are available in the:deprecated
branch.
Release as of 24/01/22
-
Updated the generative model. The new version improves the way numbers and abbreviations from the finance domain are pronounced.
-
You can now emphasize using markup:
Are you **happy** to see me?
-
Processing of SSML pauses and SIL tags has been made consistent to support integration with Yandex.Dialogs
. Pauses in text in SSML or SIL notation are considered the end of an utterance, causing intonation representing the end of an utterance to replace the tag in the generated text. SSML pauses and SIL tags are supported when generating both short and long speech segments.
Release as of 16/12/21
-
Limits for API v3 requests have been increased: length of a synthesized phrase is 250 characters or 24 seconds of audio. Important: request costs remain unchanged for the time being but may increase.
-
The
unsafe_mode
option in API v3 enables you to automatically split long segments of text submitted for synthesis into separate phrases. -
The silence after the last word is synthesized is much shorter. Now, the audio ends almost immediately after the final word is synthesized.
Release as of 18/11/21
- Introduced some stabilizing fixes for the
alena
premium voice. It now sounds consistent. - Fixed pronunciation errors for
alena
. - Pausing in REST API has been improved.
- New premium voices have been added in test mode:
oksana:rc
ermil:rc
jane:rc
omazh:rc
zahar:rc