Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit Hybrid
  • System requirements
  • Service architecture
  • API authentication
  • Pricing policy
  • SpeechKit Hybrid releases

In this article:

  • Hardware requirements for routing and licensing servers
  • Hardware requirements for STT and TTS servers
  • Examples of hardware calculations for STT and TTS servers
  • Software requirements

System requirements

Written by
Yandex Cloud
Updated at August 12, 2025
  • Hardware requirements for routing and licensing servers
  • Hardware requirements for STT and TTS servers
    • Examples of hardware calculations for STT and TTS servers
  • Software requirements

To install SpeechKit Hybrid, you need a Linux server with Docker Engine support. For a list of supported operating systems, see the Docker official documentation.

Warning

The CPU must support the AVX2 (Advanced Vector Support) instruction set.

On Linux, to check whether your host supports AVX2, run:

grep -q avx2 /proc/cpuinfo && echo AVX2 || echo No AVX2

Hardware requirements for routing and licensing serversHardware requirements for routing and licensing servers

Recommended hardware requirements for routing (Envoy) and licensing (License) servers. The number of CPUs and amount of RAM of such servers depend on the number and type of GPUs used on STT and TTS servers.

GPU type RAM per card,
GB
Physical CPU cores
per card, pcs
Logical CPU cores (vCPUs)
per card, pcs
NVIDIA® Tesla® T4 2 2 4
NVIDIA® Tesla® V100 4 4 8
NVIDIA® Tesla® L4 8 8 16
NVIDIA® Ampere® A100 10 10 20

Hardware requirements for STT and TTS serversHardware requirements for STT and TTS servers

Recommended hardware requirements for running SpeechKit Hybrid containers estimated values of SpeechKit Hybrid specifications are given for reference based on hardware using the NVIDIA 535 driver. The actual values may change after containers are updated and new features are added.

The data in the tables is for the Russian language only:

Containers with GPU T4
Containers with GPU V100
Containers with GPU L4
Containers with GPU A100
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 50 64 200 8 16
Audio file recognition 250 64 200 8 16
Speech synthesis
Speech synthesis 80 64 200 8 16
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 110 64 200 8 16
Audio file recognition 500 64 200 8 16
Speech synthesis
Speech synthesis 200 64 200 8 16
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 66 64 200 8 16
Audio file recognition 330 64 200 8 16
Speech synthesis
Speech synthesis 383 64 200 8 16
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Boolean cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 245 64 200 14 28
Audio file recognition 1,000 64 200 14 28
Speech synthesis
Speech synthesis 480 64 200 14 28

1 Seconds per second (SPS): Number of seconds of recognized or synthesized text per runtime second.

Examples of hardware calculations for STT and TTS serversExamples of hardware calculations for STT and TTS servers

The number of cards required for speech recognition or speech-to-text synthesis depends on the SPS value. Use the following formula for calculation:

Streaming recognition
Audio file recognition
Speech synthesis

User SPS = X × Y

Where:

  • X is the percentage of conversation with recognition enabled. If interruptions need to be factored in, X = 1.
  • Y is the number of concurrent calls.

User SPS = X / Y

Where:

  • X is the duration of audio to recognize, in seconds.
  • Y is the time required for audio recognition, in seconds.

User SPS = X × (Y / 10)

Where:

  • X is the expected number of requests per second.
  • Y is the average request length in characters.

The number of cards is calculated as follows:

User SPS / guaranteed card SPS

The resulting value is rounded up to the nearest integer.

To get the required amount of RAM, HDDs, and cores, multiply the table values by the number of cards.

Software requirementsSoftware requirements

Your dedicated SpeechKit Hybrid server must have NVIDIA LTS 535 drivers and NVIDIA Container Toolkit 1.15 or higher. For more information about the drivers, see the official NVIDIA documentation. You do not need to install the CUDA Toolkit as it comes as part of SpeechKit Hybrid images.

To install and configure SpeechKit Hybrid services, you will need the Yandex Cloud CLI and a registry in Yandex Container Registry.

  1. If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

  2. Create a registry in Yandex Container Registry.

    By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

    yc container registry create --name speechkit-hybrid
    

    Result:

    id: <registry_ID>
    folder_id: <folder_ID>
    name: speechkit-hybrid
    status: ACTIVE
    created_at: "<creation_date_and_time>"
    
  3. Create a service account with the editor role for the selected folder.

  4. Create an API key for the service account.

  5. Notify the SpeechKit team of the created registry ID. All required containers will appear in your registry, and you will get the docker-compose.yaml file with the deployment settings.

Was the article helpful?

Next
Service architecture
© 2025 Direct Cursus Technology L.L.C.