Yandex SpeechKit release notes: Speech recognition
- Current version
- Previous versions
- Release on June 26, 2024
- Release on June 3, 2024
- Release on April 23, 2024
- Release on April 9, 2024
- Release on March 27, 2024
- Release on February 28, 2024
- Release on February 27, 2024
- Release on January 12, 2024
- Release on January 12, 2024
- Release on December 29, 2023
- Release on November 22, 2023
- Release on November 10, 2023
- Release on September 6, 2023
- Release on August 15, 2023
- Release on July 20, 2023
- Release on July 7, 2023
- Release on June 13, 2023
- Release on June 7, 2023
- Release on May 25, 2023
- Release on May 17, 2023
- Release on April 14, 2023
- Release on March 16, 2023
- Release on March 7, 2023
- Release on February 8, 2023
- Release on December 20, 2022
- Release on October 20, 2022
- Release on October 5, 2022
- Release on September 20, 2022
- Release on June 29, 2022
- Release on June 7, 2022
- Release on April 25, 2022
- Release on April 19, 2022
- Release on March 14, 2022
- Release on March 2, 2022
- Release on February 17, 2022
- Release on February 3, 2022
- Release on January 26, 2022
SpeechKit provides updates based on the system model and version.
For more information about speech recognition methods, see About technology.
Current version
Release on August 9, 2024
Сhanges to general:rc
:
- Improved recognition quality for Uzbek and Kazakh.
- You can now restrict recognition languages by specifying multiple values in the
language_restriction
field.
Previous versions
Release on June 26, 2024
The general:rc
updates from June 3 are now available in the general
model.
Improved recognition quality for Uzbek in general:rc
.
Release on June 3, 2024
Upon user requests, general:rc
recognition quality has been improved for abbreviations and medical terms in Russian.
Release on April 23, 2024
The general:rc
updates of April 9 are now available in the general
model.
Release on April 9, 2024
Changed the format of classifiers in the general:rc
model. The formal_greeting
, informal_greeting
, formal_farewell
, informal_farewell
, insult
, and profanity
classifiers now return results as a triggering probability. The answerphone
and negative
classifiers now return only the triggering probability rather than the probability of belonging to two classes.
Release on March 27, 2024
All general:rc
updates of February 28 are now available in the general
model.
Updates to general:rc
:
- Improved recognition quality for Uzbek.
- Improved speaker labeling in recognition results.
Release on February 28, 2024
Updates to general:rc
:
- Improved recognition quality for Uzbek.
- As requested by users, improved recognition quality for medications, car models, and tobacco products in Russian.
Release on February 27, 2024
All changes to the general:rc
model are now available in the general
model.
Release on January 12, 2024
Added support for speaker labeling in recognition results in general:rc
.
Release on January 12, 2024
Improved recognition quality for Uzbek in general:rc
.
Release on December 29, 2023
Updates to general:rc
:
-
Fixed normalization errors for certain number representations (e.g., fifteen hundred ⟶ 1500).
-
Added support for the following classifiers:
gender
classifier: The classification returns probability values for themale
andfemale
classes.negative
classifier: The classification returns probability values for thenegative
andnot_negative
classes.answerphone
classifier: The classification returns probability values for theanswerphone
andnot_answerphone
classes.
-
Enabled triggering classifiers for partial recognition results (the
ON_PARTIAL
event).
Release on November 22, 2023
All changes to the general:rc
model are now available in the general
model.
Release on November 10, 2023
Сhanges to general:rc
:
- Russian speech recognition model has been updated.
- The quality of recognizing names of cities in the Republic of Kazakhstan has been improved based on user requests.
- The quality of normalization of speech recognition results in the Kazakh language has been enhanced.
- Internal server errors that occur when working with small audio fragments have been fixed.
Release on September 6, 2023
Сhanges to general:rc
:
- Fixed the issue with English words appearing in a recognized Russian model.
- Improved the general quality of recognition for Russian.
- Improved recognition quality for the Russian model as per user requests.
- Improved the general quality of recognition for Uzbek.
Audio classifiers added to general:rc
in the service release on August 15, 2023 are now available in the general
model.
Release on August 15, 2023
The general:rc
model now supports audio classifiers.
Release on July 20, 2023
Resampling fixed, new dialog metrics now available in the general
model.
Release on July 7, 2023
Сhanges to general:rc
:
- Two-channel audio resampling bug fixed in API v3.
- Dialog metrics can now be calculated for speech analytics. Metric calculation is set up using the
speech_analysis
option in theStreamingOptions
message.
Release on June 13, 2023
Fixed switching to English during Russian speech recognition in general:rc
.
Release on June 7, 2023
Сhanges to general:rc
:
- Improved the recognition accuracy for Uzbek, German, French, Dutch, Italian, Polish, and Hebrew.
- Added number normalization for Uzbek.
- Added support for splitting text into phrases using
eou_update
in FullData mode.
Release on May 25, 2023
Upgrades to the May 17, 2023 release are now available in the general
model.
Release on May 17, 2023
Сhanges to general:rc
:
- Improved the general quality of recognition for Russian.
- Improved recognition quality for the Russian model as per user requests.
- Improved recognition quality for Uzbek, German, French, Dutch, Italian, and Polish.
- Added support for a new recognition language: Hebrew (
he-HE
).
Release on April 14, 2023
Improved recognition quality for abbreviations in Russian on client scenarios for the general:rc
model.
Release on March 16, 2023
Upgrades to the release on March 7, 2023 are now available in the general
model.
Release on March 7, 2023
For the general:rc
model:
- Improved recognition quality for Uzbek.
- Added support for number normalization when recognizing speech in English, German, French, Italian, Spanish, and Turkish. Number normalization is also available for Kazakh speech recognition in test mode.
Release on February 8, 2023
- The first version of Uzbek speech recognition is now available in the
general:rc
model for all API versions. Under some acoustic conditions, Uzbek can be recognized as Kazakh. The issue will be fixed in future model releases. - To access the
general:rc
model in API v3, you can now specify this value in themodel
parameter.
Release on December 20, 2022
In the general:rc
model:
- Based on user requests, we improved recognition quality for the names of medications and first, last, and middle names.
- Slightly improved recognition quality for Kazakh and Turkish.
Release on October 20, 2022
In the general:rc
model:
- Added recognition of Brazilian Portuguese, the language code is
pt-BR
. - Improved speech recognition quality for all languages in auto recognition mode.
- Slightly improved recognition quality for Russian and Kazakh.
Release on October 5, 2022
Upgrades to the September 20 release are available in the general
model.
Release on September 20, 2022
In the general:rc
model:
- Improved recognition quality for Moscow neighborhoods and medications in Russian.
- Added language classification in auto recognition mode.
The fixes are available for testing.
Release on June 29, 2022
- The
general
version of the multi-language model is available. - In the
general:rc
andgeneral
versions, the multi-language model can accept hints about which languages are contained in the speech. - Upgrades to
general:rc
from June 7 are available in thegeneral
model for Russian.
Release on June 7, 2022
- Improved punctuation placement and recognition of last names in the
general:rc
model. - Upgrades of the release from April 25 are available in the
general
model.
Release on April 25, 2022
Changes to the general:rc
model:
- Improved recognition of such words as gasification and regasification.
- Added service feedback when processing OGG-OPUS format was added. If a stream is not a valid audio in OPUS format, the service returns
Invalid_Argument
.
Release on April 19, 2022
- Added Turkish language to the multi-language speech recognition model.
- A new API version is available for Yandex SpeechKit streaming recognition. The old interface will also be supported, but all new features will only be available in API v3.
Release on March 14, 2022
The general:rc
version from March 2, 2022 is available under the general
tag.
Release on March 2, 2022
Improved recognition of names, addresses, and terms as well as punctuation placement in long sentences and texts with numbers is now available in the general
model.
The general:rc
model has undergone further upgrades based on user data.
Release on February 17, 2022
The current release improved the quality of the Russian-language general:rc
model in the following areas:
- Recognition of last and first names, patronymics, and addresses.
- Recognition of customer-specific terms. The model was enhanced with data from a user request dated February 1, 2022, and corrected based on user data from November 9, 2021.
- Punctuation in long sentences and texts with numbers.
Release on February 3, 2022
-
In the
general:rc
, a universal mode ("auto"
language) is available. In this mode, the model can recognize speech in one of the following languages:- Russian
- Kazakh
- English
- German
- French
- Finnish
- Swedish
- Dutch
- Polish
- Portuguese
- Italian
- Spanish
-
New languages are also available under their own codes. The
general:rc
model uses indication as a hint for language recognition. If the language is indicated explicitly, the model will use it as a hint to improve the recognition quality. Currently, a hint only affects the quality of recognition of Russian.
When using general:rc
, we recommend enabling auto-tuning.
Known problems: in universal mode, recognition quality may deteriorate in the case of continuous speech without pauses.
Release on January 26, 2022
-
The
general
andgeneral:rc
recognition models for the Kazakh language are available in streaming and delayed recognition modes. -
The
general:rc
model now supports a punctuator in streaming and delayed recognition modes. -
In delayed recognition mode, you can now work with MP3
format.