Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • Supported languages
    • Streaming recognition
    • Recognition result normalization
    • Analyzing recognition results
    • Speaker labeling
    • Using LLMs to process recognition results
    • Extending a speech recognition model
    • Uploading fine-tuning data for a speech recognition model
    • Detecting the end of utterance
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  • Audit Trails events

In this article:

  • Auto-tuning based on logged data
  • Using audio to improve quality
  • Fine-tuning
  1. Speech recognition
  2. Extending a speech recognition model

Extending a speech recognition model

Written by
Yandex Cloud
Updated at June 20, 2025
  • Auto-tuning based on logged data
  • Using audio to improve quality
  • Fine-tuning

SpeechKit provides multiple ways to improve speech recognition:

  • Auto-tuning
  • Using audio to improve quality
  • Model tuning

Auto-tuning based on logged dataAuto-tuning based on logged data

By default, SpeechKit does not save data provided by users. However, the most effective way to improve a speech recognition model is to train it on real user data.

To improve the quality of speech recognition, use model auto-tuning. It will enable you to save data transmitted in requests and use it for further training. Auto-tuning improves recognition quality while the model is running without any further data collecting actions on your part.

Auto-tuning is a good option under these conditions:

  • The current script used for work tasks fails to recognize some of the vocabulary.
  • The vocabulary for auto-tuning should be easy to perceive by ear and to transcribe. For example, drug names are not suitable as these terms come from a narrow domain. To recognize domain-specific terms, use model tuning.

To send data for auto-tuning, provide the x-data-logging-enabled: true header in your API requests. For an example with logging enabled, see Request headers for troubleshooting in Yandex SpeechKit. Then contact support to request model fine-tuning on the provided data.

Auto-tuning tips:

  • It will take a minimum of 10 hours of audio in Russian to achieve a difference in the quality of recognition. Recognition models for other languages may require more data. The recommended data volume is 50 hours or more.
  • Recognition model training takes about three months for Russian. During this time, the team will check and validate the data, add it to the training dataset, and train the model. For other languages, contact your account manager.

Using audio to improve qualityUsing audio to improve quality

You can improve speech recognition by submitting an audio file to the SpeechKit team. This method is similar to auto-tuning but uses an audio prepared by you instead of data provided in API requests. Submit it to the support team as a ZIP archive. You can also attach transcripts of the messages, but this is optional.

For the recommended data size, refer to the auto-tuning restrictions.

Fine-tuningFine-tuning

The basic speech recognition model is designed to work with everyday language, but it may not be sufficient to recognize specific vocabulary. By tuning, you can train the model to recognize domain-specific terms from different fields:

  • Medicine: Diagnoses, biological terms, drug names.
  • Business: Company names.
  • Trade: Product ranges (jewelry, electronics, and so on).
  • Finance: Banking terms and names of banking products.

For tuning you will need a list of terms (words or phrases) and at least three free-form text examples for each term.

Tuning is available for the Russian language only.

Tuning takes about two months from when you submit the data archive to the support team.

Was the article helpful?

Previous
Using LLMs to process recognition results
Next
Uploading fine-tuning data for a speech recognition model
© 2025 Direct Cursus Technology L.L.C.