Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex SpeechKit Hybrid
  • System requirements
  • Service architecture
  • API authentication
  • Pricing policy
  • SpeechKit Hybrid releases

In this article:

  • Hardware requirements for routing and licensing servers
  • Hardware requirements for STT and TTS servers
  • Examples of hardware calculations for STT and TTS servers
  • Software requirements

System requirements

Written by
Yandex Cloud
Updated at May 28, 2026
  • Hardware requirements for routing and licensing servers
  • Hardware requirements for STT and TTS servers
    • Examples of hardware calculations for STT and TTS servers
  • Software requirements

To install SpeechKit Hybrid, you need a Linux server with Docker Engine support. For a list of supported operating systems, see the Docker official documentation.

Warning

The CPU must support the AVX2 (Advanced Vector Support) instruction set.

On Linux, to check whether your host supports AVX2, run:

grep -q avx2 /proc/cpuinfo && echo AVX2 || echo No AVX2

Hardware requirements for routing and licensing serversHardware requirements for routing and licensing servers

Recommended hardware requirements for routing (Envoy) and licensing (License) servers. The number of CPUs and amount of RAM of such servers depend on the number and type of GPUs used on STT and TTS servers.

GPU type RAM per card,
GB
Physical CPUs
per card, pcs
Logical CPUs (vCPUs)
per card, pcs
NVIDIA® Tesla® L4 8 8 16
NVIDIA® Ampere® A100 10 10 20
NVIDIA® Tesla® H100 20 20 40

Hardware requirements for STT and TTS serversHardware requirements for STT and TTS servers

Below are the recommended hardware requirements for running SpeechKit Hybrid containers. The estimated values of SpeechKit Hybrid specifications are given for reference for the hardware with the NVIDIA 535 driver. The actual values may change after containers are updated and new features arrive.

The data in the tables is for the Russian language only, unless otherwise specified:

Containers with the L4 GPU
Containers with the A100 GPU
Containers with the H100 PCI GPU
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Logical cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 66 64 200 8 16
Audio file recognition 330 64 200 8 16
Speech synthesis
Speech synthesis 266 64 200 8 16
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Logical cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition 245 64 200 14 28
Audio file recognition 1,000 64 200 14 28
Speech synthesis
Speech synthesis 581 64 200 14 28
Operation mode Guaranteed
SPS1
RAM per card,
GB
HDD per card,
GB
Physical processor
cores
Intel Gold 6230R
per card, pcs
Logical cores
Intel Gold 6230R
per card, pcs
Speech recognition
Streaming recognition, Russian 385 64 200 8 16
Streaming recognition, multi-lingual model 245 64 200 8 16
Audio file recognition, Russian 3,500 64 200 8 16
Audio file recognition with speaker labeling, Russian 2,590 64 200 8 16
Speech synthesis
Speech synthesis 1,260 64 200 8 16

1 Seconds per second (SPS): Number of seconds of recognized or synthesized text per runtime second.

Examples of hardware calculations for STT and TTS serversExamples of hardware calculations for STT and TTS servers

The number of cards required for speech recognition or speech-to-text synthesis depends on the SPS value. Use the following formula for calculation:

Streaming recognition
Audio file recognition
Speech synthesis

User SPS = X × Y

Where:

  • X: Conversation portion with recognition enabled. If you need to factor in interruptions, X = 1.
  • Y: Number of concurrent calls.

User SPS = X / Y

Where:

  • X: Duration of audio to recognize in seconds.
  • Y: Time required for audio recognition in seconds.

User SPS = X × (Y / 10)

Where:

  • X: Expected number of requests per second.
  • Y: Average request length in characters.

The number of cards is calculated as follows:

User SPS / guaranteed card SPS

The resulting value is rounded up to the nearest integer.

To get the required amount of RAM, HDDs, and cores, multiply the table values by the number of cards.

Software requirementsSoftware requirements

Your dedicated SpeechKit Hybrid server must have NVIDIA LTS 535 drivers and NVIDIA Container Toolkit 1.15 or higher. For more information about the drivers, see the official NVIDIA documentation. You do not need to install the CUDA Toolkit as it comes as part of SpeechKit Hybrid images.

To install and configure SpeechKit Hybrid services, you will need the Yandex Cloud CLI and a registry in Yandex Container Registry.

  1. If you do not have the Yandex Cloud CLI yet, install and initialize it.

  2. Create a registry in Yandex Container Registry.

    The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also specify a different folder for any command using --folder-name or --folder-id. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    yc container registry create --name speechkit-hybrid
    

    Result:

    id: <registry_ID>
    folder_id: <folder_ID>
    name: speechkit-hybrid
    status: ACTIVE
    created_at: "<creation_date_and_time>"
    
  3. Create a service account with the editor role for the selected folder.

  4. Create an API key for the service account.

  5. Notify the SpeechKit team of the created registry ID. All required containers will appear in your registry, and you will get the docker-compose.yaml file with the deployment settings.

Was the article helpful?

Next
Service architecture
© 2026 Direct Cursus Technology L.L.C.