Talk Analytics API, gRPC: TalkService
Written by
Updated at September 13, 2024
- Calls TalkService
- UploadAsStream
- Upload
- UploadText
- Search
- Get
- GetTalkRequest
- GetTalkResponse
- Talk
- Field
- Transcription
- Phrase
- PhraseText
- Word
- PhraseStatistics
- UtteranceStatistics
- AudioSegmentBoundaries
- DescriptiveStatistics
- Quantile
- RecognitionClassifierResult
- PhraseHighlight
- RecognitionClassifierLabel
- AlgorithmMetadata
- Error
- SpeechStatistics
- SilenceStatistics
- InterruptsStatistics
- InterruptsEvaluation
- ConversationStatistics
- SpeakerStatistics
- Points
- Quiz
- TextClassifiers
- ClassificationResult
- ClassifierStatistics
- Histogram
- Summarization
- SummarizationStatement
- SummarizationField
Call | Description |
---|---|
UploadAsStream | rpc for streaming talk documents. |
Upload | rpc for uploading talk document as single message |
UploadText | rpc for uploading text talk document |
Search | rpc for searching talks. |
Get | rpc for bulk get |
Calls TalkService
UploadAsStream
rpc for streaming talk documents. First message should contain Talk related metadata, second - audio metadata, others should contain audio bytes in chunks
rpc UploadAsStream (stream StreamTalkRequest) returns (UploadTalkResponse)
StreamTalkRequest
Field | Description |
---|---|
Event | oneof: metadata or audio |
metadata | TalkMetadata talk document metadata containing channel id and channel field values |
audio | AudioStreamingRequest audio metadata or chunk |
TalkMetadata
Field | Description |
---|---|
connection_id | string id of connection this talk belongs too |
fields | map<string,string> channel defined fields |
users[] | UserMetadata per user specific metadata |
UserMetadata
Field | Description |
---|---|
id | string |
role | enum UserRole |
fields | map<string,string> |
AudioStreamingRequest
Field | Description |
---|---|
AudioEvent | oneof: audio_metadata or chunk |
audio_metadata | AudioMetadata Session options. Should be the first message from user. |
chunk | AudioChunk Chunk with audio data. |
AudioMetadata
Field | Description |
---|---|
AudioFormat | oneof: raw_audio or container_audio |
raw_audio | RawAudio Audio without container. |
container_audio | ContainerAudio Audio is wrapped in container. |
RawAudio
Field | Description |
---|---|
audio_encoding | enum AudioEncoding Type of audio encoding
|
sample_rate_hertz | int64 PCM sample rate |
audio_channel_count | int64 PCM channel count. |
ContainerAudio
Field | Description |
---|---|
container_audio_type | enum ContainerAudioType Type of audio container.
|
AudioChunk
Field | Description |
---|---|
data | bytes Bytes with audio data. |
UploadTalkResponse
Field | Description |
---|---|
talk_id | string id of created talk document |
Upload
rpc for uploading talk document as single message
rpc Upload (UploadTalkRequest) returns (UploadTalkResponse)
UploadTalkRequest
Field | Description |
---|---|
metadata | TalkMetadata |
audio | AudioRequest audio payload |
TalkMetadata
Field | Description |
---|---|
connection_id | string id of connection this talk belongs too |
fields | map<string,string> channel defined fields |
users[] | UserMetadata per user specific metadata |
UserMetadata
Field | Description |
---|---|
id | string |
role | enum UserRole |
fields | map<string,string> |
AudioRequest
Field | Description |
---|---|
audio_metadata | AudioMetadata audio metadata |
audio_data | AudioChunk Bytes with audio data. |
AudioMetadata
Field | Description |
---|---|
AudioFormat | oneof: raw_audio or container_audio |
raw_audio | RawAudio Audio without container. |
container_audio | ContainerAudio Audio is wrapped in container. |
RawAudio
Field | Description |
---|---|
audio_encoding | enum AudioEncoding Type of audio encoding
|
sample_rate_hertz | int64 PCM sample rate |
audio_channel_count | int64 PCM channel count. |
ContainerAudio
Field | Description |
---|---|
container_audio_type | enum ContainerAudioType Type of audio container.
|
AudioChunk
Field | Description |
---|---|
data | bytes Bytes with audio data. |
UploadTalkResponse
Field | Description |
---|---|
talk_id | string id of created talk document |
UploadText
rpc for uploading text talk document
rpc UploadText (UploadTextRequest) returns (UploadTextResponse)
UploadTextRequest
Field | Description |
---|---|
metadata | TalkMetadata |
text_content | TextContent |
TalkMetadata
Field | Description |
---|---|
connection_id | string id of connection this talk belongs too |
fields | map<string,string> channel defined fields |
users[] | UserMetadata per user specific metadata |
UserMetadata
Field | Description |
---|---|
id | string |
role | enum UserRole |
fields | map<string,string> |
TextContent
Field | Description |
---|---|
messages[] | Message |
Message
Field | Description |
---|---|
user_id | string |
timestamp | google.protobuf.Timestamp |
payload | oneof: text |
text | TextPayload |
TextPayload
Field | Description |
---|---|
text | string |
UploadTextResponse
Field | Description |
---|---|
talk_id | string id of created talk document |
Search
rpc for searching talks. will return ids only
rpc Search (SearchTalkRequest) returns (SearchTalkResponse)
SearchTalkRequest
Field | Description |
---|---|
organization_id | string id of organization |
space_id | string id of space |
connection_id | string id of connection |
project_id | string id of project |
filters[] | Filter metadata keys filters (user and system) |
query | Query Full-text search query |
page_size | int64 page size, from 1 to 1000, default 100 |
page_token | string next page token, if page is not first |
sort_data | SortData talks sorting options |
Filter
Field | Description |
---|---|
key | string metadata key (user.some_key / system.created_at / analysis.speechkit.duration) |
filter | oneof: any_match , int_range , double_range , date_range , duration_range or boolean_match |
any_match | AnyMatchFilter find talk matched by any text filters |
int_range | IntRangeFilter find talks with value from int range |
double_range | DoubleRangeFilter find talks with value from double range |
date_range | DateRangeFilter find talks with value from date range |
duration_range | DurationRangeFilter find talks with value from duration range |
boolean_match | BooleanFilter find talks with value equals boolean |
inverse | bool |
channel_number | google.protobuf.Int64Value channel number to apply filter for, starting with 0. applies to all channels if not specified |
AnyMatchFilter
Field | Description |
---|---|
values[] | string values list to match with "OR" operator |
IntRangeFilter
Field | Description |
---|---|
from_value | google.protobuf.Int64Value |
to_value | google.protobuf.Int64Value |
bounds_inclusive | BoundsInclusive |
BoundsInclusive
Field | Description |
---|---|
from_inclusive | bool include from bound |
to_inclusive | bool include to bound |
DoubleRangeFilter
Field | Description |
---|---|
from_value | google.protobuf.DoubleValue |
to_value | google.protobuf.DoubleValue |
bounds_inclusive | BoundsInclusive |
DateRangeFilter
Field | Description |
---|---|
from_value | google.protobuf.Timestamp |
to_value | google.protobuf.Timestamp |
bounds_inclusive | BoundsInclusive |
DurationRangeFilter
Field | Description |
---|---|
from_value | google.protobuf.Duration |
to_value | google.protobuf.Duration |
bounds_inclusive | BoundsInclusive |
BooleanFilter
Field | Description |
---|---|
value | bool |
Query
Field | Description |
---|---|
text | string |
inverse | bool should or should NOT match |
channel_number | google.protobuf.Int64Value id of channel to search ("1", "2", ..., any channel if not set) |
SortData
Field | Description |
---|---|
fields[] | SortField |
SortField
Field | Description |
---|---|
field | string sorting key |
order | enum SortOrder sorting order by current field |
position | int64 number of field in comparing order (sort by key1 (position = 0), then key2 (position = 1), then key3...) |
SearchTalkResponse
Field | Description |
---|---|
talk_ids[] | string page results entries |
talks_count | int64 total documents matched |
next_page_token | string page token for next request |
Get
rpc for bulk get
rpc Get (GetTalkRequest) returns (GetTalkResponse)
GetTalkRequest
Field | Description |
---|---|
organization_id | string id of organization |
space_id | string id of space |
connection_id | string id of connection to search data |
project_id | string id of project to search data |
talk_ids[] | string ids of talks to return. Requesting too many talks may result in "message exceeds maximum size" error. Up to 100 of talks per request is recommended. |
results_mask | google.protobuf.FieldMask All types of analysis will be returned if not set. |
GetTalkResponse
Field | Description |
---|---|
talk[] | Talk |
Talk
Field | Description |
---|---|
id | string talk id |
organization_id | string |
space_id | string |
connection_id | string |
project_ids[] | string |
created_by | string audition info |
created_at | google.protobuf.Timestamp |
modified_by | string |
modified_at | google.protobuf.Timestamp |
talk_fields[] | Field key-value representation of talk fields with values |
transcription | yandex.cloud.speechsense.v1.analysis.Transcription various ml analysis results |
speech_statistics | yandex.cloud.speechsense.v1.analysis.SpeechStatistics |
silence_statistics | yandex.cloud.speechsense.v1.analysis.SilenceStatistics |
interrupts_statistics | yandex.cloud.speechsense.v1.analysis.InterruptsStatistics |
conversation_statistics | yandex.cloud.speechsense.v1.analysis.ConversationStatistics |
points | yandex.cloud.speechsense.v1.analysis.Points |
text_classifiers | yandex.cloud.speechsense.v1.analysis.TextClassifiers |
summarization | yandex.cloud.speechsense.v1.analysis.Summarization |
Field
Field | Description |
---|---|
name | string name of the field |
value | string field value |
type | enum FieldType field type |
Transcription
Field | Description |
---|---|
phrases[] | Phrase |
algorithms_metadata[] | AlgorithmMetadata Their might be several algorithms that work on talk transcription. For example: speechkit and translator So there might be other fields here for tracing |
Phrase
Field | Description |
---|---|
channel_number | int64 |
start_time_ms | int64 |
end_time_ms | int64 |
phrase | PhraseText |
statistics | PhraseStatistics |
classifiers[] | RecognitionClassifierResult |
PhraseText
Field | Description |
---|---|
text | string |
language | string |
normalized_text | string |
words[] | Word |
Word
Field | Description |
---|---|
word | string |
start_time_ms | int64 |
end_time_ms | int64 |
PhraseStatistics
Field | Description |
---|---|
statistics | UtteranceStatistics |
UtteranceStatistics
Field | Description |
---|---|
speaker_tag | string |
speech_boundaries | AudioSegmentBoundaries Audio segment boundaries |
total_speech_ms | int64 Total speech duration |
speech_ratio | double Speech ratio within audio segment |
total_silence_ms | int64 Total silence duration |
silence_ratio | double Silence ratio within audio segment |
words_count | int64 Number of words in recognized speech |
letters_count | int64 Number of letters in recognized speech |
words_per_second | DescriptiveStatistics Descriptive statistics for words per second distribution |
letters_per_second | DescriptiveStatistics Descriptive statistics for letters per second distribution |
AudioSegmentBoundaries
Field | Description |
---|---|
start_time_ms | int64 Audio segment start time |
end_time_ms | int64 Audio segment end time |
duration_seconds | int64 Duration in seconds |
DescriptiveStatistics
Field | Description |
---|---|
min | double Minimum observed value |
max | double Maximum observed value |
mean | double Estimated mean of distribution |
std | double Estimated standard deviation of distribution |
quantiles[] | Quantile List of evaluated quantiles |
Quantile
Field | Description |
---|---|
level | double Quantile level in range (0, 1) |
value | double Quantile value |
RecognitionClassifierResult
Field | Description |
---|---|
start_time_ms | int64 Start time of the audio segment used for classification |
end_time_ms | int64 End time of the audio segment used for classification |
classifier | string Name of the triggered classifier |
highlights[] | PhraseHighlight List of highlights, i.e. parts of phrase that determine the result of the classification |
labels[] | RecognitionClassifierLabel Classifier predictions |
PhraseHighlight
Field | Description |
---|---|
text | string Text transcription of the highlighted audio segment |
offset | int64 offset in symbols from the beginning of whole phrase where highlight begins |
count | int64 count of symbols in highlighted text |
RecognitionClassifierLabel
Field | Description |
---|---|
label | string The label of the class predicted by the classifier |
confidence | double The prediction confidence |
AlgorithmMetadata
Field | Description |
---|---|
created_task_date | google.protobuf.Timestamp |
completed_task_date | google.protobuf.Timestamp |
error | Error |
trace_id | string |
name | string |
Error
Field | Description |
---|---|
code | string |
message | string |
SpeechStatistics
Field | Description |
---|---|
total_simultaneous_speech_duration_seconds | int64 Total simultaneous speech duration in seconds |
total_simultaneous_speech_duration_ms | int64 Total simultaneous speech duration in ms |
total_simultaneous_speech_ratio | double Simultaneous speech ratio within audio segment |
simultaneous_speech_duration_estimation | DescriptiveStatistics Descriptive statistics for simultaneous speech duration distribution |
SilenceStatistics
Field | Description |
---|---|
total_simultaneous_silence_duration_ms | int64 |
total_simultaneous_silence_ratio | double Simultaneous silence ratio within audio segment |
simultaneous_silence_duration_estimation | DescriptiveStatistics Descriptive statistics for simultaneous silence duration distribution |
total_simultaneous_silence_duration_seconds | int64 |
InterruptsStatistics
Field | Description |
---|---|
speaker_interrupts[] | InterruptsEvaluation Interrupts description for every speaker |
InterruptsEvaluation
Field | Description |
---|---|
speaker_tag | string Speaker tag |
interrupts_count | int64 Number of interrupts made by the speaker |
interrupts_duration_ms | int64 Total duration of all interrupts |
interrupts[] | AudioSegmentBoundaries Boundaries for every interrupt |
interrupts_duration_seconds | int64 Total duration of all interrupts in seconds |
ConversationStatistics
Field | Description |
---|---|
conversation_boundaries | AudioSegmentBoundaries Audio segment boundaries |
speaker_statistics[] | SpeakerStatistics Average statistics for each speaker |
SpeakerStatistics
Field | Description |
---|---|
speaker_tag | string Speaker tag |
complete_statistics | UtteranceStatistics analysis of all phrases in format of single utterance |
words_per_utterance | DescriptiveStatistics Descriptive statistics for words per utterance distribution |
letters_per_utterance | DescriptiveStatistics Descriptive statistics for letters per utterance distribution |
utterance_count | int64 Number of utterances |
utterance_duration_estimation | DescriptiveStatistics Descriptive statistics for utterance duration distribution |
Points
Field | Description |
---|---|
quiz[] | Quiz |
Quiz
Field | Description |
---|---|
request | string |
response | google.protobuf.StringValue |
id | string |
TextClassifiers
Field | Description |
---|---|
classification_result[] | ClassificationResult |
ClassificationResult
Field | Description |
---|---|
classifier | string Classifier name |
classifier_statistics[] | ClassifierStatistics Classifier statistics |
ClassifierStatistics
Field | Description |
---|---|
channel_number | google.protobuf.Int64Value Channel number, null for whole talk |
total_count | int64 classifier total count |
histograms[] | Histogram Represents various histograms build on top of classifiers |
Histogram
Field | Description |
---|---|
count_values[] | int64 histogram count values. For example: if len(count_values) = 2, it means that histogram is 50/50, if len(count_values) = 3 - [0] value represents first third, [1] - second third, [2] - last third, etc. |
Summarization
Field | Description |
---|---|
statements[] | SummarizationStatement |
SummarizationStatement
Field | Description |
---|---|
field | SummarizationField |
response[] | string |
SummarizationField
Field | Description |
---|---|
id | string |
name | string |
type | enum SummarizationFieldType |