Yandex Cloud
Поиск
Связаться с намиПодключиться
  • Документация
  • Блог
  • Все сервисы
  • Статус работы сервисов
    • Популярные
    • Инфраструктура и сеть
    • Платформа данных
    • Контейнеры
    • Инструменты разработчика
    • Бессерверные вычисления
    • Безопасность
    • Мониторинг и управление ресурсами
    • Машинное обучение
    • Бизнес-инструменты
  • Все решения
    • По отраслям
    • По типу задач
    • Экономика платформы
    • Безопасность
    • Техническая поддержка
    • Каталог партнёров
    • Обучение и сертификация
    • Облако для стартапов
    • Облако для крупного бизнеса
    • Центр технологий для общества
    • Облако для интеграторов
    • Поддержка IT-бизнеса
    • Облако для фрилансеров
    • Обучение и сертификация
    • Блог
    • Документация
    • Контент-программа
    • Мероприятия и вебинары
    • Контакты, чаты и сообщества
    • Идеи
    • Истории успеха
    • Тарифы Yandex Cloud
    • Промоакции и free tier
    • Правила тарификации
  • Документация
  • Блог
Проект Яндекса
© 2025 ООО «Яндекс.Облако»
Yandex SpeechKit
  • Обзор технологий SpeechKit
    • Обзор
    • Аутентификация в API
        • Overview
          • Overview
          • RecognizeFile
          • GetRecognition
          • DeleteRecognition
  • Поддерживаемые форматы аудио
  • Интеграция телефонии
  • Квоты и лимиты
  • Управление доступом
  • Правила тарификации

В этой статье:

  • HTTP request
  • Body parameters
  • RecognitionModelOptions
  • AudioFormatOptions
  • RawAudio
  • ContainerAudio
  • TextNormalizationOptions
  • LanguageRestrictionOptions
  • RecognitionClassifierOptions
  • RecognitionClassifier
  • SpeechAnalysisOptions
  • SpeakerLabelingOptions
  • Response
  • Status
  1. Справочники API
  2. Распознавание
  3. API v3 REST (англ.)
  4. AsyncRecognizer
  5. RecognizeFile

SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile

Статья создана
Yandex Cloud
Обновлена 24 февраля 2025 г.
  • HTTP request
  • Body parameters
  • RecognitionModelOptions
  • AudioFormatOptions
  • RawAudio
  • ContainerAudio
  • TextNormalizationOptions
  • LanguageRestrictionOptions
  • RecognitionClassifierOptions
  • RecognitionClassifier
  • SpeechAnalysisOptions
  • SpeakerLabelingOptions
  • Response
  • Status

HTTP request

POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync

Body parameters

{
  // Includes only one of the fields `content`, `uri`
  "content": "string",
  "uri": "string",
  // end of the list of possible fields
  "recognitionModel": {
    "model": "string",
    "audioFormat": {
      // Includes only one of the fields `rawAudio`, `containerAudio`
      "rawAudio": {
        "audioEncoding": "string",
        "sampleRateHertz": "string",
        "audioChannelCount": "string"
      },
      "containerAudio": {
        "containerAudioType": "string"
      }
      // end of the list of possible fields
    },
    "textNormalization": {
      "textNormalization": "string",
      "profanityFilter": "boolean",
      "literatureText": "boolean",
      "phoneFormattingMode": "string"
    },
    "languageRestriction": {
      "restrictionType": "string",
      "languageCode": [
        "string"
      ]
    },
    "audioProcessingType": "string"
  },
  "recognitionClassifier": {
    "classifiers": [
      {
        "classifier": "string",
        "triggers": [
          "string"
        ]
      }
    ]
  },
  "speechAnalysis": {
    "enableSpeakerAnalysis": "boolean",
    "enableConversationAnalysis": "boolean",
    "descriptiveStatisticsQuantiles": [
      "string"
    ]
  },
  "speakerLabeling": {
    "speakerLabeling": "string"
  }
}

Field

Description

content

string (bytes)

Bytes with data

Includes only one of the fields content, uri.

uri

string

S3 data url

Includes only one of the fields content, uri.

recognitionModel

RecognitionModelOptions

Configuration for speech recognition model.

recognitionClassifier

RecognitionClassifierOptions

Configuration for classifiers over speech recognition.

speechAnalysis

SpeechAnalysisOptions

Configuration for speech analysis over speech recognition.

speakerLabeling

SpeakerLabelingOptions

Configuration for speaker labeling

RecognitionModelOptions

Field

Description

model

string

Sets the recognition model for the cloud version of SpeechKit. Possible values: 'general', 'general:rc', 'general:deprecated'.
The model is ignored for SpeechKit Hybrid.

audioFormat

AudioFormatOptions

Specified input audio.

textNormalization

TextNormalizationOptions

Text normalization options.

languageRestriction

LanguageRestrictionOptions

Possible languages in audio.

audioProcessingType

enum (AudioProcessingType)

How to deal with audio data (in real time, after all data is received, etc). Default is REAL_TIME.

  • AUDIO_PROCESSING_TYPE_UNSPECIFIED
  • REAL_TIME: Process audio in mode optimized for real-time recognition, i.e. send partials and final responses as soon as possible
  • FULL_DATA: Process audio after all data was received

AudioFormatOptions

Audio format options.

Field

Description

rawAudio

RawAudio

Audio without container.

Includes only one of the fields rawAudio, containerAudio.

containerAudio

ContainerAudio

Audio is wrapped in container.

Includes only one of the fields rawAudio, containerAudio.

RawAudio

RAW Audio format spec (no container to infer type). Used in AudioFormat options.

Field

Description

audioEncoding

enum (AudioEncoding)

Type of audio encoding

  • AUDIO_ENCODING_UNSPECIFIED
  • LINEAR16_PCM: Audio bit depth 16-bit signed little-endian (Linear PCM).

sampleRateHertz

string (int64)

PCM sample rate

audioChannelCount

string (int64)

PCM channel count. Currently only single channel audio is supported in real-time recognition.

ContainerAudio

Audio with fixed type in container. Used in AudioFormat options.

Field

Description

containerAudioType

enum (ContainerAudioType)

Type of audio container.

  • CONTAINER_AUDIO_TYPE_UNSPECIFIED
  • WAV: Audio bit depth 16-bit signed little-endian (Linear PCM).
  • OGG_OPUS: Data is encoded using the OPUS audio codec and compressed using the OGG container format.
  • MP3: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

TextNormalizationOptions

Options

Field

Description

textNormalization

enum (TextNormalization)

  • TEXT_NORMALIZATION_UNSPECIFIED
  • TEXT_NORMALIZATION_ENABLED: Enable normalization
  • TEXT_NORMALIZATION_DISABLED: Disable normalization

profanityFilter

boolean

Profanity filter (default: false).

literatureText

boolean

Rewrite text in literature style (default: false).

phoneFormattingMode

enum (PhoneFormattingMode)

Define phone formatting mode

  • PHONE_FORMATTING_MODE_UNSPECIFIED
  • PHONE_FORMATTING_MODE_DISABLED: Disable phone formatting

LanguageRestrictionOptions

Type of restriction for the list of languages expected in the incoming speech stream.

Field

Description

restrictionType

enum (LanguageRestrictionType)

Language restriction type

  • LANGUAGE_RESTRICTION_TYPE_UNSPECIFIED
  • WHITELIST: The allowing list. The incoming audio can contain only the listed languages.
  • BLACKLIST: The forbidding list. The incoming audio cannot contain the listed languages.

languageCode[]

string

The list of language codes to restrict recognition in the case of an auto model

RecognitionClassifierOptions

Field

Description

classifiers[]

RecognitionClassifier

List of classifiers to use

RecognitionClassifier

Field

Description

classifier

string

Classifier name

triggers[]

enum (TriggerType)

Describes the types of responses to which the classification results will come

  • TRIGGER_TYPE_UNSPECIFIED
  • ON_UTTERANCE: Apply classifier to utterance responses
  • ON_FINAL: Apply classifier to final responses
  • ON_PARTIAL: Apply classifier to partial responses

SpeechAnalysisOptions

Field

Description

enableSpeakerAnalysis

boolean

Analyse speech for every speaker

enableConversationAnalysis

boolean

Analyse conversation of two speakers

descriptiveStatisticsQuantiles[]

string

Quantile levels in range (0, 1) for descriptive statistics

SpeakerLabelingOptions

Field

Description

speakerLabeling

enum (SpeakerLabeling)

Specifies the execution of speaker labeling. Default is SPEAKER_LABELING_DISABLED.

  • SPEAKER_LABELING_UNSPECIFIED
  • SPEAKER_LABELING_ENABLED: Enable speaker labeling
  • SPEAKER_LABELING_DISABLED: Disable speaker labeling

Response

HTTP Code: 200 - OK

{
  "id": "string",
  "description": "string",
  "createdAt": "string",
  "createdBy": "string",
  "modifiedAt": "string",
  "done": "boolean",
  "metadata": "object",
  // Includes only one of the fields `error`
  "error": {
    "code": "integer",
    "message": "string",
    "details": [
      "object"
    ]
  }
  // end of the list of possible fields
}

An Operation resource. For more information, see Operation.

Field

Description

id

string

ID of the operation.

description

string

Description of the operation. 0-256 characters long.

createdAt

string (date-time)

Creation timestamp.

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

createdBy

string

ID of the user or service account who initiated the operation.

modifiedAt

string (date-time)

The time when the Operation resource was last modified.

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

done

boolean

If the value is false, it means the operation is still in progress.
If true, the operation is completed, and either error or response is available.

metadata

object

Service-specific metadata associated with the operation.
It typically contains the ID of the target resource that the operation is performed on.
Any method that returns a long-running operation should document the metadata type, if any.

error

Status

The error result of the operation in case of failure or cancellation.

Includes only one of the fields error.

The operation result.
If done == false and there was no failure detected, neither error nor response is set.
If done == false and there was a failure detected, error is set.
If done == true, exactly one of error or response is set.

Status

The error result of the operation in case of failure or cancellation.

Field

Description

code

integer (int32)

Error code. An enum value of google.rpc.Code.

message

string

An error message.

details[]

object

A list of messages that carry the error details.

Была ли статья полезна?

Предыдущая
Overview
Следующая
GetRecognition
Проект Яндекса
© 2025 ООО «Яндекс.Облако»