Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechSense
  • Getting started
    • Authentication with the API
      • Overview
        • Overview
        • Search
        • Get
  • Audit Trails events
  • Access management
  • Pricing policy
  • Release notes
  • FAQ

In this article:

  • HTTP request
  • Body parameters
  • Response
  • Talk
  • Field
  • Transcription
  • Phrase
  • PhraseText
  • Word
  • PhraseStatistics
  • UtteranceStatistics
  • AudioSegmentBoundaries
  • DescriptiveStatistics
  • Quantile
  • RecognitionClassifierResult
  • PhraseHighlight
  • RecognitionClassifierLabel
  • AlgorithmMetadata
  • Error
  • SpeechStatistics
  • SilenceStatistics
  • InterruptsStatistics
  • InterruptsEvaluation
  • ConversationStatistics
  • SpeakerStatistics
  • Points
  • Quiz
  • TextClassifiers
  • ClassificationResult
  • ClassifierStatistics
  • Histogram
  • Summarization
  • SummarizationStatement
  • SummarizationField
  • TalkState
  • AlgorithmProcessingInfo
  1. API reference
  2. REST
  3. Talk
  4. Get

Talk Analytics API, REST: Talk.Get

Written by
Yandex Cloud
Updated at January 14, 2025
  • HTTP request
  • Body parameters
  • Response
  • Talk
  • Field
  • Transcription
  • Phrase
  • PhraseText
  • Word
  • PhraseStatistics
  • UtteranceStatistics
  • AudioSegmentBoundaries
  • DescriptiveStatistics
  • Quantile
  • RecognitionClassifierResult
  • PhraseHighlight
  • RecognitionClassifierLabel
  • AlgorithmMetadata
  • Error
  • SpeechStatistics
  • SilenceStatistics
  • InterruptsStatistics
  • InterruptsEvaluation
  • ConversationStatistics
  • SpeakerStatistics
  • Points
  • Quiz
  • TextClassifiers
  • ClassificationResult
  • ClassifierStatistics
  • Histogram
  • Summarization
  • SummarizationStatement
  • SummarizationField
  • TalkState
  • AlgorithmProcessingInfo

rpc for bulk get

HTTP requestHTTP request

POST https://rest-api.speechsense.yandexcloud.net/speechsense/v1/talks/get

Body parametersBody parameters

{
  "organizationId": "string",
  "spaceId": "string",
  "connectionId": "string",
  "projectId": "string",
  "talkIds": [
    "string"
  ],
  "resultsMask": "string"
}

Field

Description

organizationId

string

id of organization

spaceId

string

id of space

connectionId

string

id of connection to search data

projectId

string

id of project to search data

talkIds[]

string

ids of talks to return. Requesting too many talks may result in "message exceeds maximum size" error.
Up to 100 of talks per request is recommended.

resultsMask

string (field-mask)

A comma-separated names off ALL fields to be updated.
Only the specified fields will be changed. The others will be left untouched.
If the field is specified in updateMask and no value for that field was sent in the request,
the field's value will be reset to the default. The default value for most fields is null or 0.

If updateMask is not sent in the request, all fields' values will be updated.
Fields specified in the request will be updated to provided values.
The rest of the fields will be reset to the default.

ResponseResponse

HTTP Code: 200 - OK

{
  "talk": [
    {
      "id": "string",
      "organizationId": "string",
      "spaceId": "string",
      "connectionId": "string",
      "projectIds": [
        "string"
      ],
      "createdBy": "string",
      "createdAt": "string",
      "modifiedBy": "string",
      "modifiedAt": "string",
      "talkFields": [
        {
          "name": "string",
          "value": "string",
          "type": "string"
        }
      ],
      "transcription": {
        "phrases": [
          {
            "channelNumber": "string",
            "startTimeMs": "string",
            "endTimeMs": "string",
            "phrase": {
              "text": "string",
              "language": "string",
              "normalizedText": "string",
              "words": [
                {
                  "word": "string",
                  "startTimeMs": "string",
                  "endTimeMs": "string"
                }
              ]
            },
            "statistics": {
              "statistics": {
                "speakerTag": "string",
                "speechBoundaries": {
                  "startTimeMs": "string",
                  "endTimeMs": "string",
                  "durationSeconds": "string"
                },
                "totalSpeechMs": "string",
                "speechRatio": "string",
                "totalSilenceMs": "string",
                "silenceRatio": "string",
                "wordsCount": "string",
                "lettersCount": "string",
                "wordsPerSecond": {
                  "min": "string",
                  "max": "string",
                  "mean": "string",
                  "std": "string",
                  "quantiles": [
                    {
                      "level": "string",
                      "value": "string"
                    }
                  ]
                },
                "lettersPerSecond": {
                  "min": "string",
                  "max": "string",
                  "mean": "string",
                  "std": "string",
                  "quantiles": [
                    {
                      "level": "string",
                      "value": "string"
                    }
                  ]
                }
              }
            },
            "classifiers": [
              {
                "startTimeMs": "string",
                "endTimeMs": "string",
                "classifier": "string",
                "highlights": [
                  {
                    "text": "string",
                    "offset": "string",
                    "count": "string"
                  }
                ],
                "labels": [
                  {
                    "label": "string",
                    "confidence": "string"
                  }
                ]
              }
            ]
          }
        ],
        "algorithmsMetadata": [
          {
            "createdTaskDate": "string",
            "completedTaskDate": "string",
            "error": {
              "code": "string",
              "message": "string"
            },
            "traceId": "string",
            "name": "string"
          }
        ]
      },
      "speechStatistics": {
        "totalSimultaneousSpeechDurationSeconds": "string",
        "totalSimultaneousSpeechDurationMs": "string",
        "totalSimultaneousSpeechRatio": "string",
        "simultaneousSpeechDurationEstimation": {
          "min": "string",
          "max": "string",
          "mean": "string",
          "std": "string",
          "quantiles": [
            {
              "level": "string",
              "value": "string"
            }
          ]
        }
      },
      "silenceStatistics": {
        "totalSimultaneousSilenceDurationMs": "string",
        "totalSimultaneousSilenceRatio": "string",
        "simultaneousSilenceDurationEstimation": {
          "min": "string",
          "max": "string",
          "mean": "string",
          "std": "string",
          "quantiles": [
            {
              "level": "string",
              "value": "string"
            }
          ]
        },
        "totalSimultaneousSilenceDurationSeconds": "string"
      },
      "interruptsStatistics": {
        "speakerInterrupts": [
          {
            "speakerTag": "string",
            "interruptsCount": "string",
            "interruptsDurationMs": "string",
            "interrupts": [
              {
                "startTimeMs": "string",
                "endTimeMs": "string",
                "durationSeconds": "string"
              }
            ],
            "interruptsDurationSeconds": "string"
          }
        ]
      },
      "conversationStatistics": {
        "conversationBoundaries": {
          "startTimeMs": "string",
          "endTimeMs": "string",
          "durationSeconds": "string"
        },
        "speakerStatistics": [
          {
            "speakerTag": "string",
            "completeStatistics": {
              "speakerTag": "string",
              "speechBoundaries": {
                "startTimeMs": "string",
                "endTimeMs": "string",
                "durationSeconds": "string"
              },
              "totalSpeechMs": "string",
              "speechRatio": "string",
              "totalSilenceMs": "string",
              "silenceRatio": "string",
              "wordsCount": "string",
              "lettersCount": "string",
              "wordsPerSecond": {
                "min": "string",
                "max": "string",
                "mean": "string",
                "std": "string",
                "quantiles": [
                  {
                    "level": "string",
                    "value": "string"
                  }
                ]
              },
              "lettersPerSecond": {
                "min": "string",
                "max": "string",
                "mean": "string",
                "std": "string",
                "quantiles": [
                  {
                    "level": "string",
                    "value": "string"
                  }
                ]
              }
            },
            "wordsPerUtterance": {
              "min": "string",
              "max": "string",
              "mean": "string",
              "std": "string",
              "quantiles": [
                {
                  "level": "string",
                  "value": "string"
                }
              ]
            },
            "lettersPerUtterance": {
              "min": "string",
              "max": "string",
              "mean": "string",
              "std": "string",
              "quantiles": [
                {
                  "level": "string",
                  "value": "string"
                }
              ]
            },
            "utteranceCount": "string",
            "utteranceDurationEstimation": {
              "min": "string",
              "max": "string",
              "mean": "string",
              "std": "string",
              "quantiles": [
                {
                  "level": "string",
                  "value": "string"
                }
              ]
            }
          }
        ]
      },
      "points": {
        "quiz": [
          {
            "request": "string",
            "response": "string",
            "id": "string"
          }
        ]
      },
      "textClassifiers": {
        "classificationResult": [
          {
            "classifier": "string",
            "classifierStatistics": [
              {
                "channelNumber": "string",
                "totalCount": "string",
                "histograms": [
                  {
                    "countValues": [
                      "string"
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      "summarization": {
        "statements": [
          {
            "field": {
              "id": "string",
              "name": "string",
              "type": "string"
            },
            "response": [
              "string"
            ]
          }
        ]
      },
      "talkState": {
        "processingState": "string",
        "algorithmProcessingInfos": [
          {
            "algorithm": "string",
            "processingState": "string"
          }
        ]
      }
    }
  ]
}

Field

Description

talk[]

Talk

TalkTalk

Field

Description

id

string

talk id

organizationId

string

spaceId

string

connectionId

string

projectIds[]

string

createdBy

string

audition info

createdAt

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

modifiedBy

string

modifiedAt

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

talkFields[]

Field

key-value representation of talk fields with values

transcription

Transcription

various ml analysis results

speechStatistics

SpeechStatistics

silenceStatistics

SilenceStatistics

interruptsStatistics

InterruptsStatistics

conversationStatistics

ConversationStatistics

points

Points

textClassifiers

TextClassifiers

summarization

Summarization

talkState

TalkState

FieldField

connection field value

Field

Description

name

string

name of the field

value

string

field value

type

enum (FieldType)

field type

  • FIELD_TYPE_UNSPECIFIED
  • FIELD_TYPE_STRING
  • FIELD_TYPE_NUMBER
  • FIELD_TYPE_DECIMAL
  • FIELD_TYPE_BOOLEAN
  • FIELD_TYPE_DATE
  • FIELD_TYPE_JSON

TranscriptionTranscription

Field

Description

phrases[]

Phrase

algorithmsMetadata[]

AlgorithmMetadata

Their might be several algorithms that work on talk transcription. For example: speechkit and translator
So there might be other fields here for tracing

PhrasePhrase

Field

Description

channelNumber

string (int64)

startTimeMs

string (int64)

endTimeMs

string (int64)

phrase

PhraseText

statistics

PhraseStatistics

classifiers[]

RecognitionClassifierResult

PhraseTextPhraseText

Field

Description

text

string

language

string

normalizedText

string

words[]

Word

WordWord

Field

Description

word

string

startTimeMs

string (int64)

endTimeMs

string (int64)

PhraseStatisticsPhraseStatistics

Field

Description

statistics

UtteranceStatistics

UtteranceStatisticsUtteranceStatistics

Field

Description

speakerTag

string

speechBoundaries

AudioSegmentBoundaries

Audio segment boundaries

totalSpeechMs

string (int64)

Total speech duration

speechRatio

string

Speech ratio within audio segment

totalSilenceMs

string (int64)

Total silence duration

silenceRatio

string

Silence ratio within audio segment

wordsCount

string (int64)

Number of words in recognized speech

lettersCount

string (int64)

Number of letters in recognized speech

wordsPerSecond

DescriptiveStatistics

Descriptive statistics for words per second distribution

lettersPerSecond

DescriptiveStatistics

Descriptive statistics for letters per second distribution

AudioSegmentBoundariesAudioSegmentBoundaries

Field

Description

startTimeMs

string (int64)

Audio segment start time

endTimeMs

string (int64)

Audio segment end time

durationSeconds

string (int64)

Duration in seconds

DescriptiveStatisticsDescriptiveStatistics

Field

Description

min

string

Minimum observed value

max

string

Maximum observed value

mean

string

Estimated mean of distribution

std

string

Estimated standard deviation of distribution

quantiles[]

Quantile

List of evaluated quantiles

QuantileQuantile

Field

Description

level

string

Quantile level in range (0, 1)

value

string

Quantile value

RecognitionClassifierResultRecognitionClassifierResult

Field

Description

startTimeMs

string (int64)

Start time of the audio segment used for classification

endTimeMs

string (int64)

End time of the audio segment used for classification

classifier

string

Name of the triggered classifier

highlights[]

PhraseHighlight

List of highlights, i.e. parts of phrase that determine the result of the classification

labels[]

RecognitionClassifierLabel

Classifier predictions

PhraseHighlightPhraseHighlight

Field

Description

text

string

Text transcription of the highlighted audio segment

offset

string (int64)

offset in symbols from the beginning of whole phrase where highlight begins

count

string (int64)

count of symbols in highlighted text

RecognitionClassifierLabelRecognitionClassifierLabel

Field

Description

label

string

The label of the class predicted by the classifier

confidence

string

The prediction confidence

AlgorithmMetadataAlgorithmMetadata

Field

Description

createdTaskDate

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

completedTaskDate

string (date-time)

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

error

Error

traceId

string

name

string

ErrorError

Field

Description

code

string

message

string

SpeechStatisticsSpeechStatistics

Field

Description

totalSimultaneousSpeechDurationSeconds

string (int64)

Total simultaneous speech duration in seconds

totalSimultaneousSpeechDurationMs

string (int64)

Total simultaneous speech duration in ms

totalSimultaneousSpeechRatio

string

Simultaneous speech ratio within audio segment

simultaneousSpeechDurationEstimation

DescriptiveStatistics

Descriptive statistics for simultaneous speech duration distribution

SilenceStatisticsSilenceStatistics

Field

Description

totalSimultaneousSilenceDurationMs

string (int64)

totalSimultaneousSilenceRatio

string

Simultaneous silence ratio within audio segment

simultaneousSilenceDurationEstimation

DescriptiveStatistics

Descriptive statistics for simultaneous silence duration distribution

totalSimultaneousSilenceDurationSeconds

string (int64)

InterruptsStatisticsInterruptsStatistics

Field

Description

speakerInterrupts[]

InterruptsEvaluation

Interrupts description for every speaker

InterruptsEvaluationInterruptsEvaluation

Field

Description

speakerTag

string

Speaker tag

interruptsCount

string (int64)

Number of interrupts made by the speaker

interruptsDurationMs

string (int64)

Total duration of all interrupts

interrupts[]

AudioSegmentBoundaries

Boundaries for every interrupt

interruptsDurationSeconds

string (int64)

Total duration of all interrupts in seconds

ConversationStatisticsConversationStatistics

Field

Description

conversationBoundaries

AudioSegmentBoundaries

Audio segment boundaries

speakerStatistics[]

SpeakerStatistics

Average statistics for each speaker

SpeakerStatisticsSpeakerStatistics

Field

Description

speakerTag

string

Speaker tag

completeStatistics

UtteranceStatistics

analysis of all phrases in format of single utterance

wordsPerUtterance

DescriptiveStatistics

Descriptive statistics for words per utterance distribution

lettersPerUtterance

DescriptiveStatistics

Descriptive statistics for letters per utterance distribution

utteranceCount

string (int64)

Number of utterances

utteranceDurationEstimation

DescriptiveStatistics

Descriptive statistics for utterance duration distribution

PointsPoints

Field

Description

quiz[]

Quiz

QuizQuiz

Field

Description

request

string

response

string

id

string

TextClassifiersTextClassifiers

Field

Description

classificationResult[]

ClassificationResult

ClassificationResultClassificationResult

Field

Description

classifier

string

Classifier name

classifierStatistics[]

ClassifierStatistics

Classifier statistics

ClassifierStatisticsClassifierStatistics

Field

Description

channelNumber

string (int64)

Channel number, null for whole talk

totalCount

string (int64)

classifier total count

histograms[]

Histogram

Represents various histograms build on top of classifiers

HistogramHistogram

Field

Description

countValues[]

string (int64)

histogram count values. For example:
if len(count_values) = 2, it means that histogram is 50/50,
if len(count_values) = 3 - [0] value represents first third, [1] - second third, [2] - last third, etc.

SummarizationSummarization

Field

Description

statements[]

SummarizationStatement

SummarizationStatementSummarizationStatement

Field

Description

field

SummarizationField

response[]

string

SummarizationFieldSummarizationField

Field

Description

id

string

name

string

type

enum (SummarizationFieldType)

  • SUMMARIZATION_FIELD_TYPE_UNSPECIFIED
  • TEXT
  • TEXT_ARRAY

TalkStateTalkState

Field

Description

processingState

enum (ProcessingState)

  • PROCESSING_STATE_UNSPECIFIED
  • PROCESSING_STATE_NOT_STARTED
  • PROCESSING_STATE_PROCESSING
  • PROCESSING_STATE_SUCCESS
  • PROCESSING_STATE_FAILED

algorithmProcessingInfos[]

AlgorithmProcessingInfo

AlgorithmProcessingInfoAlgorithmProcessingInfo

Field

Description

algorithm

enum (Algorithm)

  • ALGORITHM_UNSPECIFIED
  • ALGORITHM_SPEECHKIT
  • ALGORITHM_YGPT
  • ALGORITHM_CLASSIFIER
  • ALGORITHM_SUMMARIZATION
  • ALGORITHM_EMBEDDING
  • ALGORITHM_STATISTICS

processingState

enum (ProcessingState)

  • PROCESSING_STATE_UNSPECIFIED
  • PROCESSING_STATE_NOT_STARTED
  • PROCESSING_STATE_PROCESSING
  • PROCESSING_STATE_SUCCESS
  • PROCESSING_STATE_FAILED

Was the article helpful?

Previous
Search
Next
Audit Trails events
© 2025 Direct Cursus Technology L.L.C.