Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • System requirements
    • SpeechKit Hybrid documentation
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Hardware requirements for STT and TTS servers
  • Examples of hardware calculations for STT and TTS servers
  • Software requirements
  1. SpeechKit Hybrid
  2. System requirements

System requirements

Written by
Yandex Cloud
Updated at January 23, 2025
  • Hardware requirements for STT and TTS servers
    • Examples of hardware calculations for STT and TTS servers
  • Software requirements

To install SpeechKit Hybrid, you need a Linux server with Docker Engine support. For a list of supported operating systems, see the Docker official documentation.

Warning

The CPU must support the AVX2 (Advanced Vector Support) instruction set.

On Linux, to check whether your host supports AVX2, run:

grep -q avx2 /proc/cpuinfo && echo AVX2 || echo No AVX2

Hardware requirements for STT and TTS serversHardware requirements for STT and TTS servers

Recommended hardware requirements for running SpeechKit Hybrid containers estimated values of SpeechKit Hybrid specifications are given for reference based on hardware using the NVIDIA 535 driver. The actual values may change after containers are updated and new features are added.

The data in the tables is for the Russian language only:

Containers with GPU T4
Containers with GPU V100
Containers with GPU A100
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 50 64 200 8 16
Audio file recognition 250 64 200 8 16
Speech synthesis
Speech synthesis 80 64 200 8 16
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 110 64 200 8 16
Audio file recognition 500 64 200 8 16
Speech synthesis
Speech synthesis 200 64 200 8 16
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 245 64 200 14 28
Audio file recognition 1,000 64 200 14 28
Speech synthesis
Speech synthesis 480 64 200 14 28

1 Seconds per second (SPS): Number of seconds of recognized or synthesized text per runtime second.

Examples of hardware calculations for STT and TTS serversExamples of hardware calculations for STT and TTS servers

The number of cards required for speech recognition or speech-to-text synthesis depends on the SPS value. Use the following formula for calculation:

Streaming recognition
Audio file recognition
Speech synthesis

User SPS = X × Y

Where:

  • X is the percentage of conversation with recognition enabled. If interruptions need to be factored in, X = 1.
  • Y is the number of concurrent calls.

User SPS = X / Y

Where:

  • X is the duration of audio to recognize, in seconds.
  • Y is the time required for audio recognition, in seconds.

User SPS = X × (Y / 10)

Where:

  • X is the expected number of requests per second.
  • Y is the average request length in characters.

The number of cards is calculated as follows:

User SPS / guaranteed card SPS

The resulting value is rounded up to the nearest integer.

To get the required amount of RAM, HDDs, and cores, multiply the table values by the number of cards.

Software requirementsSoftware requirements

To install and configure SpeechKit Hybrid services:

  • Install the Yandex Cloud command line interface.
  • Create a registry in Yandex Container Registry.

Was the article helpful?

Previous
About the technology
Next
SpeechKit Hybrid documentation
Yandex project
© 2025 Yandex.Cloud LLC