Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • Supported languages
    • Streaming recognition
    • Recognition result normalization
    • Analyzing recognition results
    • Speaker labeling
    • Using LLMs to process recognition results
    • Extending a speech recognition model
    • Uploading fine-tuning data for a speech recognition model
    • Detecting the end of utterance
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  • Audit Trails events
  1. Speech recognition
  2. Speaker labeling

Speaker labeling in recognition results

Written by
Yandex Cloud
Updated at October 20, 2025

In recognition results, the API v3 can specify which speaker uttered each recognized phrase.

Speaker labeling is only available for recognition in FULL_DATA mode for mono records. Recognition results may not feature more than two speakers.

To enable speaker labeling, use the following session parameters:

Python 3
recognize_options = stt_pb2.StreamingOptions(
  speaker_labeling=stt_pb2.SpeakerLabelingOptions(
    # Enabling speaker labeling
      speaker_labeling=stt_pb2.SpeakerLabelingOptions.SPEAKER_LABELING_ENABLED
  ),
  recognition_model=stt_pb2.RecognitionModelOptions(
    # Recognition model version
      model="general:rc",
      audio_format=stt_pb2.AudioFormatOptions(
          container_audio=stt_pb2.ContainerAudio(
              container_audio_type=stt_pb2.ContainerAudio.WAV
          )
      ),
    # Recognition mode
      audio_processing_type=stt_pb2.RecognitionModelOptions.FULL_DATA
      )
  )

You will see channel_tag labels in recognition results, set to 0 or 1. Usually, 0 stands for the left channel and 1, for the right one; however, the numbering may differ.

Each value refers to a single speaker. You can process the results as follows:

Python 3
try:
    for r in it:
        event_type, alternatives = r.WhichOneof('Event'), None
        if event_type == 'final':
            alternatives = [a.text for a in r.final.alternatives]
        elif event_type == 'final_refinement':
            alternatives = [a.text for a in r.final_refinement.normalized_text.alternatives]
        else:
            continue
        print(f'type={event_type}, alternatives={alternatives}, channel_tag = {r.channel_tag}')
except grpc._channel._Rendezvous as err:
    print(f'Error code {err._state.code}, message: {err._state.details}')
    raise err

Was the article helpful?

Previous
Analyzing recognition results
Next
Using LLMs to process recognition results
© 2025 Direct Cursus Technology L.L.C.