Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • Supported languages
    • Streaming recognition
    • Recognition result normalization
    • Analyzing recognition results
    • Speaker labeling
    • Extending a speech recognition model
    • Uploading fine-tuning data for a speech recognition model
    • Detecting the end of utterance
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  1. Speech recognition
  2. Detecting the end of utterance

Detecting the end of utterance

Written by
Yandex Cloud
Updated at April 11, 2025

EOU (End-of-Utterance, end of utterance is a flag indicating the end of utterance in streaming recognition. In the course of streaming recognition, the SpeechKit server returns the recognition results for parts of the utterance, not the whole utterance:

  • Intermediate: Results with the partial flag, a part of the utterance may change.
  • Final: Results with the final flag, a part of the utterance is fixed.

SpeechKit only returns a complete utterance after detecting the EOU. Precise EOU detection allows listening to a speaker to the end without any interruptions and recognizing their speech, as well as helping a voice assistant respond more naturally (with a reply or request for clarification).

EOU occurs in the following cases:

  • The gRPC session is terminated.

  • Silence is recognized in the last speech fragment. Silence can be represented by one of these two parameters:

    • chunk: Sound recognized as silence.
    • silence_chunk: Silence duration in milliseconds. This parameter allows you to reduce the audio packet size by excluding silence that does not require recognition.

To impact EOU detection, set up how to use API v3:

  • Set the max_pause_between_words_hint_ms parameter that controls the expected duration (in milliseconds) of pauses between words within an utterance. With this parameter, you can avoid incorrect EOU detection when a speaker is dictating numbers slowly or set up how fast the voice assistant should respond to the end of speech.
  • Set the type parameter in eou_classifier_options=default_classifier, which defines the EOU detection method sensitivity:
    • DEFAULT: Default method.
    • HIGH: Compared to DEFAULT, detects EOU faster (shorter server response time) but false positives are possible (lower detection precision).

You can also detect the EOU on your own based on data from SpeechKit (parts of utterances, recognition statistics, etc.):

  1. In your API request that initiates a recognition session, set the eou_classifier_options=external_classifier parameter.
  2. If EOU is detected within the session, add the eou parameter to the SpeechKit server request (leave it blank).

SpeechKit will use it as a pointer to EOU and return a complete utterance in response.

Was the article helpful?

Previous
Uploading fine-tuning data for a speech recognition model
Next
About the technology
© 2025 Direct Cursus Technology L.L.C.