Detecting the end of utterance
EOU (End-of-Utterance, end of utterance
is a flag indicating the end of utterance in streaming recognition. In the course of streaming recognition, the SpeechKit server returns the recognition results for parts of the utterance, not the whole utterance:
- Intermediate: Results with the
partial
flag, a part of the utterance may change. - Final: Results with the
final
flag, a part of the utterance is fixed.
SpeechKit only returns a complete utterance after detecting the EOU. Precise EOU detection allows listening to a speaker to the end without any interruptions and recognizing their speech, as well as helping a voice assistant respond more naturally (with a reply or request for clarification).
EOU occurs in the following cases:
-
The gRPC session is terminated.
-
Silence is recognized in the last speech fragment. Silence can be represented by one of these two parameters:
chunk
: Sound recognized as silence.silence_chunk
: Silence duration in milliseconds. This parameter allows you to reduce the audio packet size by excluding silence that does not require recognition.
To impact EOU detection, set up how to use API v3:
- Set the
max_pause_between_words_hint_ms
parameter that controls the expected duration (in milliseconds) of pauses between words within an utterance. With this parameter, you can avoid incorrect EOU detection when a speaker is dictating numbers slowly or set up how fast the voice assistant should respond to the end of speech. - Set the
type
parameter ineou_classifier_options=default_classifier
, which defines the EOU detection method sensitivity:DEFAULT
: Default method.HIGH
: Compared toDEFAULT
, detects EOU faster (shorter server response time) but false positives are possible (lower detection precision).
You can also detect the EOU on your own based on data from SpeechKit (parts of utterances, recognition statistics, etc.):
- In your API request that initiates a recognition session, set the
eou_classifier_options=external_classifier
parameter. - If EOU is detected within the session, add the
eou
parameter to the SpeechKit server request (leave it blank).
SpeechKit will use it as a pointer to EOU and return a complete utterance in response.