Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • API authentication
        • Overview
          • Overview
          • RecognizeFile
          • GetRecognition
          • DeleteRecognition
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • HTTP request
  • Body parameters
  • RecognitionModelOptions
  • AudioFormatOptions
  • RawAudio
  • ContainerAudio
  • TextNormalizationOptions
  • LanguageRestrictionOptions
  • RecognitionClassifierOptions
  • RecognitionClassifier
  • SpeechAnalysisOptions
  • SpeakerLabelingOptions
  • Response
  • Status
  1. API references
  2. Recognition
  3. API v3 REST
  4. AsyncRecognizer
  5. RecognizeFile

SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile

Written by
Yandex Cloud
Updated at February 24, 2025
  • HTTP request
  • Body parameters
  • RecognitionModelOptions
  • AudioFormatOptions
  • RawAudio
  • ContainerAudio
  • TextNormalizationOptions
  • LanguageRestrictionOptions
  • RecognitionClassifierOptions
  • RecognitionClassifier
  • SpeechAnalysisOptions
  • SpeakerLabelingOptions
  • Response
  • Status

HTTP requestHTTP request

POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync

Body parametersBody parameters

{
  // Includes only one of the fields `content`, `uri`
  "content": "string",
  "uri": "string",
  // end of the list of possible fields
  "recognitionModel": {
    "model": "string",
    "audioFormat": {
      // Includes only one of the fields `rawAudio`, `containerAudio`
      "rawAudio": {
        "audioEncoding": "string",
        "sampleRateHertz": "string",
        "audioChannelCount": "string"
      },
      "containerAudio": {
        "containerAudioType": "string"
      }
      // end of the list of possible fields
    },
    "textNormalization": {
      "textNormalization": "string",
      "profanityFilter": "boolean",
      "literatureText": "boolean",
      "phoneFormattingMode": "string"
    },
    "languageRestriction": {
      "restrictionType": "string",
      "languageCode": [
        "string"
      ]
    },
    "audioProcessingType": "string"
  },
  "recognitionClassifier": {
    "classifiers": [
      {
        "classifier": "string",
        "triggers": [
          "string"
        ]
      }
    ]
  },
  "speechAnalysis": {
    "enableSpeakerAnalysis": "boolean",
    "enableConversationAnalysis": "boolean",
    "descriptiveStatisticsQuantiles": [
      "string"
    ]
  },
  "speakerLabeling": {
    "speakerLabeling": "string"
  }
}

Field

Description

content

string (bytes)

Bytes with data

Includes only one of the fields content, uri.

uri

string

S3 data url

Includes only one of the fields content, uri.

recognitionModel

RecognitionModelOptions

Configuration for speech recognition model.

recognitionClassifier

RecognitionClassifierOptions

Configuration for classifiers over speech recognition.

speechAnalysis

SpeechAnalysisOptions

Configuration for speech analysis over speech recognition.

speakerLabeling

SpeakerLabelingOptions

Configuration for speaker labeling

RecognitionModelOptionsRecognitionModelOptions

Field

Description

model

string

Sets the recognition model for the cloud version of SpeechKit. Possible values: 'general', 'general:rc', 'general:deprecated'.
The model is ignored for SpeechKit Hybrid.

audioFormat

AudioFormatOptions

Specified input audio.

textNormalization

TextNormalizationOptions

Text normalization options.

languageRestriction

LanguageRestrictionOptions

Possible languages in audio.

audioProcessingType

enum (AudioProcessingType)

How to deal with audio data (in real time, after all data is received, etc). Default is REAL_TIME.

  • AUDIO_PROCESSING_TYPE_UNSPECIFIED
  • REAL_TIME: Process audio in mode optimized for real-time recognition, i.e. send partials and final responses as soon as possible
  • FULL_DATA: Process audio after all data was received

AudioFormatOptionsAudioFormatOptions

Audio format options.

Field

Description

rawAudio

RawAudio

Audio without container.

Includes only one of the fields rawAudio, containerAudio.

containerAudio

ContainerAudio

Audio is wrapped in container.

Includes only one of the fields rawAudio, containerAudio.

RawAudioRawAudio

RAW Audio format spec (no container to infer type). Used in AudioFormat options.

Field

Description

audioEncoding

enum (AudioEncoding)

Type of audio encoding

  • AUDIO_ENCODING_UNSPECIFIED
  • LINEAR16_PCM: Audio bit depth 16-bit signed little-endian (Linear PCM).

sampleRateHertz

string (int64)

PCM sample rate

audioChannelCount

string (int64)

PCM channel count. Currently only single channel audio is supported in real-time recognition.

ContainerAudioContainerAudio

Audio with fixed type in container. Used in AudioFormat options.

Field

Description

containerAudioType

enum (ContainerAudioType)

Type of audio container.

  • CONTAINER_AUDIO_TYPE_UNSPECIFIED
  • WAV: Audio bit depth 16-bit signed little-endian (Linear PCM).
  • OGG_OPUS: Data is encoded using the OPUS audio codec and compressed using the OGG container format.
  • MP3: Data is encoded using MPEG-1/2 Layer III and compressed using the MP3 container format.

TextNormalizationOptionsTextNormalizationOptions

Options

Field

Description

textNormalization

enum (TextNormalization)

  • TEXT_NORMALIZATION_UNSPECIFIED
  • TEXT_NORMALIZATION_ENABLED: Enable normalization
  • TEXT_NORMALIZATION_DISABLED: Disable normalization

profanityFilter

boolean

Profanity filter (default: false).

literatureText

boolean

Rewrite text in literature style (default: false).

phoneFormattingMode

enum (PhoneFormattingMode)

Define phone formatting mode

  • PHONE_FORMATTING_MODE_UNSPECIFIED
  • PHONE_FORMATTING_MODE_DISABLED: Disable phone formatting

LanguageRestrictionOptionsLanguageRestrictionOptions

Type of restriction for the list of languages expected in the incoming speech stream.

Field

Description

restrictionType

enum (LanguageRestrictionType)

Language restriction type

  • LANGUAGE_RESTRICTION_TYPE_UNSPECIFIED
  • WHITELIST: The allowing list. The incoming audio can contain only the listed languages.
  • BLACKLIST: The forbidding list. The incoming audio cannot contain the listed languages.

languageCode[]

string

The list of language codes to restrict recognition in the case of an auto model

RecognitionClassifierOptionsRecognitionClassifierOptions

Field

Description

classifiers[]

RecognitionClassifier

List of classifiers to use

RecognitionClassifierRecognitionClassifier

Field

Description

classifier

string

Classifier name

triggers[]

enum (TriggerType)

Describes the types of responses to which the classification results will come

  • TRIGGER_TYPE_UNSPECIFIED
  • ON_UTTERANCE: Apply classifier to utterance responses
  • ON_FINAL: Apply classifier to final responses
  • ON_PARTIAL: Apply classifier to partial responses

SpeechAnalysisOptionsSpeechAnalysisOptions

Field

Description

enableSpeakerAnalysis

boolean

Analyse speech for every speaker

enableConversationAnalysis

boolean

Analyse conversation of two speakers

descriptiveStatisticsQuantiles[]

string

Quantile levels in range (0, 1) for descriptive statistics

SpeakerLabelingOptionsSpeakerLabelingOptions

Field

Description

speakerLabeling

enum (SpeakerLabeling)

Specifies the execution of speaker labeling. Default is SPEAKER_LABELING_DISABLED.

  • SPEAKER_LABELING_UNSPECIFIED
  • SPEAKER_LABELING_ENABLED: Enable speaker labeling
  • SPEAKER_LABELING_DISABLED: Disable speaker labeling

ResponseResponse

HTTP Code: 200 - OK

{
  "id": "string",
  "description": "string",
  "createdAt": "string",
  "createdBy": "string",
  "modifiedAt": "string",
  "done": "boolean",
  "metadata": "object",
  // Includes only one of the fields `error`
  "error": {
    "code": "integer",
    "message": "string",
    "details": [
      "object"
    ]
  }
  // end of the list of possible fields
}

An Operation resource. For more information, see Operation.

Field

Description

id

string

ID of the operation.

description

string

Description of the operation. 0-256 characters long.

createdAt

string (date-time)

Creation timestamp.

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

createdBy

string

ID of the user or service account who initiated the operation.

modifiedAt

string (date-time)

The time when the Operation resource was last modified.

String in RFC3339 text format. The range of possible values is from
0001-01-01T00:00:00Z to 9999-12-31T23:59:59.999999999Z, i.e. from 0 to 9 digits for fractions of a second.

To work with values in this field, use the APIs described in the
Protocol Buffers reference.
In some languages, built-in datetime utilities do not support nanosecond precision (9 digits).

done

boolean

If the value is false, it means the operation is still in progress.
If true, the operation is completed, and either error or response is available.

metadata

object

Service-specific metadata associated with the operation.
It typically contains the ID of the target resource that the operation is performed on.
Any method that returns a long-running operation should document the metadata type, if any.

error

Status

The error result of the operation in case of failure or cancellation.

Includes only one of the fields error.

The operation result.
If done == false and there was no failure detected, neither error nor response is set.
If done == false and there was a failure detected, error is set.
If done == true, exactly one of error or response is set.

StatusStatus

The error result of the operation in case of failure or cancellation.

Field

Description

code

integer (int32)

Error code. An enum value of google.rpc.Code.

message

string

An error message.

details[]

object

A list of messages that carry the error details.

Was the article helpful?

Previous
Overview
Next
GetRecognition
© 2025 Direct Cursus Technology L.L.C.