Detecting the end of utterance
EOU (End-of-Utterance
) is a flag that indicates where an utterance ends during streaming recognition. Under streaming recognition, the SpeechKit server returns the results of recognizing parts of an utterance rather than the entire utterance:
- Intermediate: Results with the
partial
flag, part of the utterance may change. - Final: Results with the
final
flag, utterance part is fixed.
SpeechKit only returns a complete utterance after detecting the EOU. Precise EOU detection allows listening to a speaker to the end without any interruptions and recognizing their speech, as well as helping a voice assistant respond more naturally (with a reply or request for clarification).
EOU occurs in the following cases:
-
The gRPC session has terminated.
-
Silence has been recognized in the last speech fragment. Use one of these two parameters to provide silence:
chunk
: Sound segment recognized as silence.silence_chunk
: Silence duration in milliseconds. This parameter allows you to reduce the audio packet size by excluding silence that does not require recognition.
To impact EOU detection, set up how to use API v3:
- Set the
max_pause_between_words_hint_ms
parameter that controls the expected duration (in milliseconds) of pauses between words per utterance. With this parameter, you can avoid incorrect EOU detection when a speaker is dictating numbers slowly or set up how fast the voice assistant should respond to the end of speech. - Specify the
type
parameter ineou_classifier_options=default_classifier
. This classifier sets the EOU detection method sensitivity:DEFAULT
: Default method.HIGH
: Compared to theDEFAULT
method, the EOU is detected faster (reduced server response time); however, this may cause false responses, as the accuracy of detection is lower.
You can also detect the EOU on your own based on data from SpeechKit (parts of utterances, recognition statistics, etc.):
- In your API request that initiates a recognition session, set the
eou_classifier_options=external_classifier
parameter. - If EOU is detected within the session, add the
eou
parameter to a SpeechKit server request (leave it blank).
SpeechKit will use it as a pointer to EOU and return a complete utterance in response.