Speaker labeling in recognition results
Written by
Updated at October 17, 2024
In recognition results, the API v3 can specify which speaker uttered each recognized phrase.
Speaker labeling is only available for recognition in the FULL_DATA
mode for mono records. Recognition results may not feature more than two speakers.
To enable speaker labeling, use the following session parameters:
Python 3
recognize_options = stt_pb2.StreamingOptions(
speaker_labeling=stt_pb2.SpeakerLabelingOptions(
# Enable speaker labeling
speaker_labeling=stt_pb2.SpeakerLabelingOptions.SPEAKER_LABELING_ENABLED
),
recognition_model=stt_pb2.RecognitionModelOptions(
# Recognition model version
model="general:rc",
audio_format=stt_pb2.AudioFormatOptions(
container_audio=stt_pb2.ContainerAudio(
container_audio_type=stt_pb2.ContainerAudio.WAV
)
),
# Recognition mode
audio_processing_type=stt_pb2.RecognitionModelOptions.FULL_DATA
)
)
You will see channel_tag
labels in recognition results, with the values of either 0 or 1. Each value refers to a single speaker. You can process the results as follows:
Python 3
try:
for r in it:
event_type, alternatives = r.WhichOneof('Event'), None
if event_type == 'final':
alternatives = [a.text for a in r.final.alternatives]
elif event_type == 'final_refinement':
alternatives = [a.text for a in r.final_refinement.normalized_text.alternatives]
else:
continue
print(f'type={event_type}, alternatives={alternatives}, channel_tag = {r.channel_tag}')
except grpc._channel._Rendezvous as err:
print(f'Error code {err._state.code}, message: {err._state.details}')
raise err