Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Overview
    • API authentication
        • Streaming Recognition API
        • Asynchronous recognition API
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  1. API references
  2. Recognition
  3. API v2
  4. Streaming Recognition API

API v2 for streaming recognition

Written by
Yandex Cloud
Updated at February 10, 2025

The streaming recognition service is located at: stt.api.cloud.yandex.net:443

Message with recognition settingsMessage with recognition settings

Parameter Description
config object
Field with the recognition settings and folder ID.
config
.specification
object
Recognition settings.
config
.specification
.languageCode
string
Recognition language.
See the model description for acceptable values. The default value is ru-RU, Russian.
config
.specification
.model
string
Language model to use for recognition.
The more accurate your choice of the model, the better the recognition result. You can specify only one model per request.
The acceptable values depend on the language you select. The default value is general.
config
.specification
.profanityFilter
boolean
Profanity filter.
Acceptable values:
  • true: Exclude profanities from the recognition results.
  • false (default): Do not exclude profanities from the recognition results.
config
.specification
.partialResults
boolean
Intermediate result filter.
Acceptable values:
  • true: Return intermediate results (part of recognized utterance). For intermediate results, final equals false.
  • false (default): Return only the final results (entire recognized utterance).
config
.specification
.singleUtterance
boolean
Flag disabling recognition after the first utterance.
Acceptable values:
  • true: Recognize only the first utterance, stop recognition and wait for the user to disconnect.
  • false (default): Continue recognition until the end of the session.
config
.specification
.audioEncoding
string
Submitted audio format.
Acceptable values:
  • LINEAR16_PCM: LPCM without a WAV header.
  • OGG_OPUS (default): OggOpus format.
config
.specification
.sampleRateHertz
integer (int64)
Submitted audio sampling frequency.
This parameter is required if format equals LINEAR16_PCM. Acceptable values:
  • 48000 (default): 48 kHz.
  • 16000: 16 kHz.
  • 8000: 8 kHz.
config.
specification.
rawResults
boolean
Flag for how to write numbers: true for words, false (default) for figures.
folderId string

ID of the folder you have access to. It is required for authentication with a user account (see Authentication with the SpeechKit API). Do not use this field if you make a request on behalf of a service account.

The maximum string length is 50 characters.

Experimental additional recognition settingsExperimental additional recognition settings

For streaming recognition models, new recognition settings are supported. They are passed to a gRPC procedure via metadata.

Parameter Description
x-normalize-partials boolean
Flag allowing you to get intermediate recognition results (parts of recognized utterance) in a normalized format: numbers as digits, profanity filter enabled, etc.
Acceptable values:
  • true: Return a normalized result.
  • false (default): Return an non-normalized result.

Audio messageAudio message

Parameter Description
audio_content Audio fragment represented as an array of bytes. The audio must match the format specified in the message with recognition settings.

Message with recognition resultsMessage with recognition results

If speech fragment recognition is successful, you will receive a message containing a list of recognition results (chunks[]). Each result contains the following fields:

  • alternatives[]: List of recognized text alternatives. Each alternative contains the following fields:

    • text: Recognized text.
    • confidence: This field is currently not supported. Do not use it.
  • final: Flag indicating that this recognition result is final and will not change anymore. If the value is false, it means the recognition result is intermediate and may change as subsequent speech fragments get recognized.

  • endOfUtterance: Flag indicating that this result contains the end of the utterance. If the value is true, the new utterance will start with the next result you get.

    Note

    If you set singleUtterance=true, only one utterance per session will be recognized. After the message where endOfUtterance is true, the server will not recognize the following utterances and will wait for you to terminate the session.

Error codes returned by the serverError codes returned by the server

To see how gRPC statuses correspond to HTTP codes, see google.rpc.Code.

List of possible gRPC errors returned by the service:

Code Status Description
3 INVALID_ARGUMENT Incorrect request parameters specified. Detailed information is provided in the details field.
9 RESOURCE_EXHAUSTED Client exceeded a quota.
16 UNAUTHENTICATED The operation requires authentication. Check the IAM token and the folder ID that you provided.
13 INTERNAL Internal server error. This error means that the operation cannot be performed due to a server-side technical problem, e.g., due to insufficient computing resources.

Use casesUse cases

  • Example use of streaming recognition with API v2.

Was the article helpful?

Previous
Synchronous recognition API
Next
Asynchronous recognition API
© 2025 Direct Cursus Technology L.L.C.