Yandex SpeechKit release notes: Speech recognition
- Current version
- Previous versions
- Release as of 09/08/24
- Release as of 26/06/24
- Release as of 03/06/24
- Release as of 23/04/24
- Release as of 09/04/24
- Release as of 27/03/24
- Release as of 28/02/24
- Release as of 27/02/24
- Release as of 12/01/24
- Release as of 12/01/24
- Release as of 29/12/23
- Release as of 22/11/23
- Release as of 10/11/23
- Release as of 06/09/23
- Release as of 15/08/23
- Release as of 20/07/23
- Release as of 07/07/23
- Release as of 13/06/23
- Release as of 07/06/23
- Release as of 25/05/23
- Release as of 17/05/23
- Release as of 14/04/23
- Release as of 16/03/23
- Release as of 07/03/23
- Release as of 08/02/23
- Release as of 20/12/22
- Release as of 20/10/22
- Release as of 05/10/22
- Release as of 20/09/22
- Release as of 29/06/22
- Release as of 07/06/22
- Release as of 25/04/22
- Release as of 19/04/22
- Release as of 14/03/22
- Release as of 02/03/22
- Release 17.02.22
- Release 3.02.22
- Release 26.01.22
SpeechKit provides updates based on the system model and version.
For more information about speech recognition methods, see About technology.
Current version
Release as of 31/10/24
Improved recognition quality for Uzbek and Turkish in general:rc
.
Previous versions
Release as of 09/08/24
Updates to general:rc
:
- Improved recognition quality for Uzbek and Kazakh.
- You can now restrict recognition languages by specifying multiple values in the
language_restriction
field.
Release as of 26/06/24
The general:rc
updates of June 3 are now available in the general
model.
Improved recognition quality for Uzbek in general:rc
.
Release as of 03/06/24
As requested by users, general:rc
recognition quality was improved for abbreviations and medical terms in Russian.
Release as of 23/04/24
The general:rc
updates of April 9 are now available in the general
model.
Release as of 09/04/24
Changed the format of classifiers in the general:rc
model. The formal_greeting
, informal_greeting
, formal_farewell
, informal_farewell
, insult
, and profanity
classifiers now return results as a probability of positives. The answerphone
and negative
classifiers now return only the probability of positives instead of the probability of belonging to two classes.
Release as of 27/03/24
All general:rc
updates of February 28 are now available in the general
model.
Updates to general:rc
:
- Improved recognition quality for Uzbek.
- Improved speaker labeling in recognition results.
Release as of 28/02/24
Updates to general:rc
:
- Improved recognition quality for Uzbek.
- As requested by users, recognition quality was improved for medications, car models, and tobacco products in Russian.
Release as of 27/02/24
All general:rc
updates are now available in the general
model.
Release as of 12/01/24
Added support for speaker labeling in recognition results in general:rc
.
Release as of 12/01/24
Improved recognition quality for Uzbek in general:rc
.
Release as of 29/12/23
Updates to general:rc
:
-
Fixed normalization errors for certain number representations (e.g., fifteen hundred ⟶ 1500).
-
Added support for the following classifiers:
gender
. Returns probability values for themale
andfemale
classes.negative
. Returns probability values for thenegative
andnot_negative
classes.answerphone
. Returns probability values for theanswerphone
andnot_answerphone
classes.
-
Added classifier positives for partial recognition results (
ON_PARTIAL
event).
Release as of 22/11/23
All general:rc
updates are now available in the general
model.
Release as of 10/11/23
Updates to general:rc
:
- Russian speech recognition model has been updated.
- As requested by users, recognition quality was improved for names of cities in Kazakhstan.
- The quality of normalization of speech recognition results in the Kazakh language has been enhanced.
- Internal server errors that occur when working with small audio fragments have been fixed.
Release as of 06/09/23
Updates to general:rc
:
- Fixed the issue with English words appearing in a recognized Russian model.
- Improved the general quality of recognition for Russian.
- Improved recognition quality for the Russian model as per user requests.
- Improved the general quality of recognition for Uzbek.
Audio classifiers added to general:rc
in August 15, 2023 release are now available in the general
model.
Release as of 15/08/23
Added support for audio classifiers in general:rc
.
Release as of 20/07/23
Resampling fixed, new dialog metrics now available in the general
model.
Release as of 07/07/23
Updates to general:rc
:
- Two-channel audio resampling bug fixed in API v3.
- Dialog metrics can now be calculated for speech analytics. Metric calculation is set up using the
speech_analysis
option in theStreamingOptions
message.
Release as of 13/06/23
Fixed switching to English during Russian speech recognition in general:rc
.
Release as of 07/06/23
Updates to general:rc
:
- Improved the recognition accuracy for Uzbek, German, French, Dutch, Italian, Polish, and Hebrew.
- Added number normalization for Uzbek.
- Added support for splitting text into phrases using
eou_update
in FullData mode.
Release as of 25/05/23
Upgrades to May 17 release are now available in the general
model.
Release as of 17/05/23
Updates to general:rc
:
- Improved the general quality of recognition for Russian.
- Improved recognition quality for the Russian model as per user requests.
- Improved recognition quality for Uzbek, German, French, Dutch, Italian, and Polish.
- Added support for a new recognition language: Hebrew (
he-HE
).
Release as of 14/04/23
Improved recognition quality for abbreviations in Russian based on client scenarios for the general:rc
model.
Release as of 16/03/23
Upgrades to the March 7 release are now available in the general
model.
Release as of 07/03/23
For the general:rc
model:
- Improved recognition quality for Uzbek.
- Added support for number normalization when recognizing speech in English, German, French, Italian, Spanish, and Turkish. Number normalization is also available for Kazakh speech recognition in test mode.
Release as of 08/02/23
- The first version of Uzbek speech recognition is now available in the
general:rc
model for all API versions. Under some acoustic conditions, Uzbek can be recognized as Kazakh. The issue will be fixed in future model releases. - To access the
general:rc
model in API v3, you can now specify this value in themodel
parameter.
Release as of 20/12/22
For the general:rc
model:
- Based on user requests, we improved recognition quality for the names of medications and first, last, and middle names.
- Slightly improved recognition quality for Kazakh and Turkish.
Release as of 20/10/22
For the general:rc
model:
- Added recognition of Brazilian Portuguese, the language code is
pt-BR
. - Improved speech recognition quality for all languages in auto recognition mode.
- Slightly improved recognition quality for Russian and Kazakh.
Release as of 05/10/22
Upgrades to the September 20 release are available in the general
model.
Release as of 20/09/22
For the general:rc
model:
- Improved recognition quality for Moscow neighborhoods and medications in Russian.
- Added language classification in auto recognition mode.
The fixes are available for testing.
Release as of 29/06/22
- The
general
version of the multi-language model is now available. - In the
general:rc
andgeneral
versions, the multi-language model can accept hints as to which languages are present in the speech. - Upgrades to
general:rc
of June 7 are available in thegeneral
model for Russian.
Release as of 07/06/22
- Improved punctuation placement and recognition of last names in the
general:rc
model. - April 25 release upgrades are available in the
general
model.
Release as of 25/04/22
Updates to the general:rc
model:
- Improved recognition of such words as gasification and regasification.
- Added service feedback when processing OGG-OPUS format was added. If a stream is not a valid audio in OPUS format, the service returns
Invalid_Argument
.
Release as of 19/04/22
- Added Turkish language to the multi-language speech recognition model.
- A new API version is available for Yandex SpeechKitstreaming recognition. The old interface will also be supported, but all new features will only be available in API v3.
Release as of 14/03/22
The March 2, 2022 general:rc
version is available under the general
tag.
Release as of 02/03/22
The general
model now offers improved recognition of names, addresses, and terms as well as punctuation placement in long sentences and texts with numbers.
The general:rc
model has undergone further upgrades based on user data.
Release 17.02.22
The current release improved the quality of the Russian-language general:rc
model in the following areas:
- Recognition of last and first names, patronymics, and addresses.
- Recognition of customer-specific terms. The model was enhanced with data from a user request dated February 1, 2022, and corrected based on user data from November 9, 2021.
- Punctuation in long sentences and texts with numbers.
Release 3.02.22
-
The
general:rc
model now supports the universal mode ("auto"
language). In this mode, the model can recognize speech in one of the following languages:- Russian
- Kazakh
- English
- German
- French
- Finnish
- Swedish
- Dutch
- Polish
- Portuguese
- Italian
- Spanish
-
New languages are also available under their own codes. The
general:rc
model uses an indication as a hint for language recognition. If the language is indicated explicitly, the model will use it as a hint to improve the recognition quality. Currently, a hint only affects the quality of recognition of Russian.
When using general:rc
, we recommend that you enable autotuning.
Known problems: in universal mode, recognition quality may deteriorate in the case of continuous speech without pauses.
Release 26.01.22
-
The
general
andgeneral:rc
recognition models for the Kazakh language are available in streaming and delayed recognition modes. -
The
general:rc
model now supports a punctuator in streaming and delayed recognition modes. -
In delayed recognition mode, you can now work with MP3
format.