Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI Studio
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  • Audit Trails events

Supported audio formats

Written by
Yandex Cloud
Updated at May 20, 2025

SpeechKit allows you to recognize and synthesize the following audio formats:

  • LPCM
  • OggOpus
  • MP3

LPCMLPCM

Linear pulse-code modulation without a WAV header.

Audio features in this format:

  • Sampling frequency:

    API version Acceptable values
    Speech synthesis API v1 8, 16, or 48 kHz
    Speech synthesis API v3 Any value between 8 and 48 kHz
    Speech recognition API v2 8, 16, or 48 kHz
    Speech recognition API v3 8, 16, or 48 kHz
  • Bit depth: 16 bit.

  • Byte order: Reversed (little-endian).

  • Audio data is stored as signed integers.

OggOpusOggOpus

For OggOpus, data is encoded using the OPUS audio codec and compressed using the OGG container format.

SpeechKit recognizes and synthesizes OggOpus without audio file quality and header restrictions.

MP3MP3

For MP3, data is encoded using the MPEG-1/2/2.5 Layer III audio codec and packaged in an MP3 container.

SpeechKit recognizes MP3 without audio file quality and header restrictions.

Warning

The MP3 format is not supported in the API v1 for synchronous recognition and API v2 for streaming recognition.

Was the article helpful?

Previous
Regular asynchronous recognition of audio files from Object Storage
Next
IVR integration
© 2025 Direct Cursus Technology L.L.C.