Yandex SpeechKit release notes: Archive
SpeechKit provides updates based on the system model and version.
For recognition
For a detailed description of the available versions, see Recognition models.
For synthesis
In speech synthesis, the service provides two types of voices: standard and premium. Premium voices use new speech synthesis technology.
For more information about voice models, see About technology.
Current version
For information about synthesis model updates, see Yandex SpeechKit release notes: Speech synthesis.
For information about recognition model updates, see Yandex SpeechKit release notes: Speech recognition.
Previous versions
Release 30.09.21
Major upgrade of premium voices available in the REST API. Voice updates are available by the tags alena:rc
and filipp:rc
.
Various improvements in synthesis quality, including the synthesis of questions. Fixed a rare problem with looping synthesis.
For testing purposes, a function for adding stress to specific words is available. It allows you to better control intonation, especially when synthesizing questions. To add a stress after a word that needs to be emphasized, add <[accented]>
. For example, in Are you glad <[accented]> to see me?
, the word glad is emphasized.
Release on 09/03/21
In streaming speech recognition, transcription, and short audio recognition by the general:rc
tag, a new version of the Demosthenes model is now available. It features improved basic recognition quality and recognizes names of healthcare professions and words related to jewelry.
We invite you to join in testing the version. Any feedback will be appreciated.
Version availability by tags
In transcription only:
hqa
: Amati version.
In streaming, transcription, and short audio recognition:
general
: Galen version.general:rc
: Demosthenes version.general:deprecated
: Zeno version.
Release on 26/02/21
In transcription by the hqa
model tag, a new version named Guarneri is now available. It features greatly improved recognition quality.
Version availability by tags
In transcription only:
hqa
: Guarneri version.
In streaming, transcription, and short audio recognition:
general
: Galen version.general:rc
: Galen version.general:deprecated
: Zeno version.
Release on 03/02/21
The Galen version of the basic recognition model was tested successfully and is the main version of the recognition model as of February 3.
Version availability by tags
In transcription only:
hqa
: Amati version.
In streaming, transcription, and short audio recognition:
general
: Galen version.general:rc
: Galen version.general:deprecated
: Zeno version.
Release on 14/12/20
In transcription by the hqa
model tag, a new version named Amati is now available. Issues have been fixed where silence was recognized instead of speech. Text recognition for news and medicine subject domains has been improved.
Version availability by tags
In transcription only:
hqa
: Amati version.
In streaming, transcription, and short audio recognition:
general
: Zeno version.general:rc
: Galen version.general:deprecated
: Anaximander version.
Release on 01/12/20
In streaming, transcription, and short audio recognition by the general:rc
tag, a new version of the Galen model is now available. It provides a significantly better basic recognition quality and recognizes words related to COVID-19.
Version availability by tags
In transcription only:
hqa
: Stradivarius version.
In streaming, transcription, and short audio recognition:
general
: Zeno version.general:rc
: Galen version.general:deprecated
: Anaximander version.
Release on 24/11/20
After successful testing, the Zeno version is now the main released version of the general
model in streaming, transcription, and short audio recognition.
Version availability by tags
In transcription only:
hqa
: Stradivarius version.
In streaming, transcription, and short audio recognition:
general
andgeneral:rc
: Zeno version.general:deprecated
: Anaximander version.
Release on 17/11/20
Numerous corrections in the pronunciation of individual words thanks to the improved normalization. Declension of numerals fixed. A new version of the alena
premium voice is now available by the alena
tag.
Version availability by tags
No changes.
Release on 26/10/20
A next-generation recognition model is available in transcription: hqa
. This model has a richer vocabulary, so recognition results are much better and more understandable to readers. The difference is especially noticeable with long audio recognition.
Version availability by tags
In transcription:
hqa
: Stradivarius version.general
: Anaximander version.general:rc
: Zeno version.general:deprecated
: Marcus Aurelius version.
In streaming and short audio recognition: no changes.
Release on 12/10/20
The new version provides significantly better basic recognition quality. A new version of the general
model is now available in streaming, transcription, and short audio recognition.
Version availability by tags
general
: Anaximander version.general:rc
: Zeno version.general:deprecated
: Marcus Aurelius version.
Release on 18/08/20
Update for transcription in the Anaximander version:
- Improved handling of dense speech flows, having no detectable pauses in speech for more than 30 seconds.
- Timing fixed.
- Fixed an error with partial recognition results arriving after the final result.
The acoustic and language properties of the model have not changed.
Version availability by tags
These versions are available for streaming recognition, transcription, and short audio recognition:
general
: Anaximander version.general:rc
: Anaximander version (updated).general:deprecated
: Marcus Aurelius version.
Release on 21/07/20
Anaximander is now the main operating version for streaming recognition, transcription, and short audio recognition.
Version availability by tags
general
andgeneral:rc
: Anaximander version.general:deprecated
: Marcus Aurelius version.
Release on 27/05/20
New versions of the general
model are now available in transcription and short audio recognition.
Version availability by tags
Available versions by tag:
general:rc
: Anaximander version.general
andgeneral:deprecated
: Marcus Aurelius version.
Versions of the general
model available for streaming recognition:
general
: Marcus Aurelius version.general:rc
: Anaximander version.general:deprecated
: Diogenes version.
Release on 15/05/20
For streaming speech recognition, the new version of the Anaximander model is now available with the general:rc
tag.
Version availability by tags
general
: Marcus Aurelius version.general:rc
: Anaximander version.general:deprecated
: Diogenes version.
The versions for short and long audio recognition remain unchanged.
Release on 16/04/20
For streaming speech recognition by the general
tag, a new version of the Marcus Aurelius model is now available.
Version availability by tags
general
andgeneral:rc:
Marcus Aurelius version.general:deprecated
: Diogenes version.
The versions for short and long audio recognition available with the general
tag remain unchanged.