Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • Supported languages
    • Streaming recognition
    • Recognition result normalization
    • Analyzing recognition results
    • Speaker labeling
    • Extending a speech recognition model
    • Uploading fine-tuning data for a speech recognition model
    • Detecting the end of utterance
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Audio classifiers
  • Audio statistics
  1. Speech recognition
  2. Analyzing recognition results

Analyzing recognition results

Written by
Yandex Cloud
Updated at December 11, 2024
  • Audio classifiers
  • Audio statistics

SpeechKit API v3 can analyze audio during recognition and return additional information together with recognition results: presence and start/end timestamps of particular words or phrases, durations of utterances and pauses, speech rate, word count of utterances, and other speech analysis labels and metrics.

Audio classifiers

Note

Audio classifiers are only supported for Russian speech.

You can apply classifiers both to intermediate and final recognition results. To enable a classifier, set the recognition_classifier parameter in the session options. The results of classifier positives will arrive in a separate message right after the events specified in the classifier settings. For classifiers, these may be the events of the partial, eou_update, or final type.

SpeechKit supports the following classifiers:

Classifier Description Result
formal_greeting Formal greeting like "good afternoon" or "good morning" Probability of a phrase belonging to the formal greeting class
informal_greeting Informal greeting like "hi" or "hey there" Probability of a phrase belonging to the informal greeting class
formal_farewell Formal farewell like "goodbye" or "have a nice day" Probability of a phrase belonging to the formal farewell class
informal_farewell Informal farewell like "bye-bye" or "adios" Probability of a phrase belonging to the informal farewell class
insult Insults like "idiot" or "jerk" Probability of a phrase belonging to the insult class
profanity Profanity Probability of the phrase belonging to the profanity class
gender Gender Probability values for male and female
negative Negativity Probability of a recognized phrase being negative
answerphone Robot's answer Probability of the phrase belonging to a voice bot or answerphone
Python
session_options = stt_pb2.StreamingRequest(
  session_options=stt_pb2.StreamingOptions(
    recognition_model="general",

    # Classifier settings
    recognition_classifier=stt_pb2.RecognitionClassifierOptions(
      classifiers=[
        # Detecting insults in utterances
        stt_pb2.RecognitionClassifier(
          classifier="insult",
          triggers=[stt_pb2.RecognitionClassifier.ON_UTTERANCE]
        ),
        # Detecting profanity in utterances
        stt_pb2.RecognitionClassifier(
          classifier="profanity",
          triggers=[stt_pb2.RecognitionClassifier.ON_UTTERANCE]
        ),
      ]
    )
  )
)

Audio statistics

SpeechKit allows you to analyze dialogs and utterances of specific speakers and calculate statistics for each participant and the dialog as a whole. Analysis results include the discrete audio characteristics and the descriptive statistics for distributions of these values.

For each speaker in the dialog, you can get:

  • Speed and length of speech
  • Duration of pauses
  • Count and sizes of utterances

For the whole dialog, you can get:

  • Duration of parallel speech and pauses
  • Interruption count and timestamps

To enable calculation of statistics, in the session settings, define the speech_analysis parameter.

recognize_options = stt_pb2.StreamingOptions(
        recognition_model=stt_pb2.RecognitionModelOptions(
            ..
            speech_analysis = stt_pb2.SpeechAnalysisOptions(
                enable_speaker_analysis = True,
                enable_conversation_analysis = True,
                descriptive_statistics_quantiles = [0.5, 0.9]
            ),
            ...
        )

You will receive the analysis results in the speaker_analysis and conversation_analysis messages.

Was the article helpful?

Previous
Recognition result normalization
Next
Speaker labeling
© 2025 Direct Cursus Technology L.L.C.