System requirements

Written by

Updated at March 5, 2026

Hardware requirements for routing and licensing servers
Hardware requirements for STT and TTS servers
- Examples of hardware calculations for STT and TTS servers
Software requirements

To install SpeechKit Hybrid, you need a Linux server with Docker Engine support. For a list of supported operating systems, see the Docker official documentation.

Warning

The CPU must support the AVX2 (Advanced Vector Support) instruction set.

On Linux, to check whether your host supports AVX2, run:

grep -q avx2 /proc/cpuinfo && echo AVX2 || echo No AVX2

Hardware requirements for routing and licensing servers

Recommended hardware requirements for routing (Envoy) and licensing (License) servers. The number of CPUs and amount of RAM of such servers depend on the number and type of GPUs used on STT and TTS servers.

GPU type	RAM per 1 card, GB	Physical CPU cores per 1 card, pcs	Logical CPU cores (vCPUs) per card, pcs
NVIDIA® Tesla® T4	2	2	4
NVIDIA® Tesla® V100	4	4	8
NVIDIA® Tesla® L4	8	8	16
NVIDIA® Ampere® A100	10	10	20
NVIDIA® Tesla® H100	20	20	40

Hardware requirements for STT and TTS servers

Recommended hardware requirements for running SpeechKit Hybrid containers estimated values of SpeechKit Hybrid specifications are given for reference based on hardware using the NVIDIA 535 driver. The actual values may change after containers are updated and new features are added.

The data in the tables is for the Russian language only, unless otherwise specified:

Containers with GPU T4

Containers with GPU V100

Containers with GPU L4

Containers with GPU A100

Containers with GPU H100 PCI

Operation mode	Guaranteed SPS¹	RAM per card, GB	HDD per card, GB	Physical processor cores Intel Gold 6230R per card, pcs	Boolean cores Intel Gold 6230R per card, pcs
Speech recognition
Streaming recognition	50	64	200	8	16
Audio file recognition	250	64	200	8	16
Speech synthesis
Speech synthesis	147	64	200	8	16

Operation mode	Guaranteed SPS¹	RAM per card, GB	HDD per card, GB	Physical processor cores Intel Gold 6230R per card, pcs	Boolean cores Intel Gold 6230R per card, pcs
Speech recognition
Streaming recognition	110	64	200	8	16
Audio file recognition	500	64	200	8	16
Speech synthesis
Speech synthesis	105	64	200	8	16

Operation mode	Guaranteed SPS¹	RAM per card, GB	HDD per card, GB	Physical processor cores Intel Gold 6230R per card, pcs	Boolean cores Intel Gold 6230R per card, pcs
Speech recognition
Streaming recognition	66	64	200	8	16
Audio file recognition	330	64	200	8	16
Speech synthesis
Speech synthesis	266	64	200	8	16

Operation mode	Guaranteed SPS¹	RAM per card, GB	HDD per card, GB	Physical processor cores Intel Gold 6230R per card, pcs	Boolean cores Intel Gold 6230R per card, pcs
Speech recognition
Streaming recognition	245	64	200	14	28
Audio file recognition	1,000	64	200	14	28
Speech synthesis
Speech synthesis	581	64	200	14	28

Operation mode	Guaranteed SPS¹	RAM per card, GB	HDD per card, GB	Physical processor cores Intel Gold 6230R per card, pcs	Boolean cores Intel Gold 6230R per card, pcs
Speech recognition
Streaming recognition, Russian	385	64	200	8	16
Streaming recognition, multi-lingual model	245	64	200	8	16
Audio file recognition, Russian	3,500	64	200	8	16
Audio file recognition with speaker labeling, Russian	2,590	64	200	8	16
Speech synthesis
Speech synthesis	1,260	64	200	8	16

¹ Seconds per second (SPS): Number of seconds of recognized or synthesized text per runtime second.

Examples of hardware calculations for STT and TTS servers

The number of cards required for speech recognition or speech-to-text synthesis depends on the SPS value. Use the following formula for calculation:

Streaming recognition

Audio file recognition

Speech synthesis

User SPS = X × Y

Where:

X is the percentage of conversation with recognition enabled. If interruptions need to be factored in, X = 1.
Y is the number of concurrent calls.

User SPS = X / Y

Where:

X is the duration of audio to recognize, in seconds.
Y is the time required for audio recognition, in seconds.

User SPS = X × (Y / 10)

Where:

X is the expected number of requests per second.
Y is the average request length in characters.

The number of cards is calculated as follows:

User SPS / guaranteed card SPS

The resulting value is rounded up to the nearest integer.

To get the required amount of RAM, HDDs, and cores, multiply the table values by the number of cards.

Software requirements

Your dedicated SpeechKit Hybrid server must have NVIDIA LTS 535 drivers and NVIDIA Container Toolkit 1.15 or higher. For more information about the drivers, see the official NVIDIA documentation. You do not need to install the CUDA Toolkit as it comes as part of SpeechKit Hybrid images.

To install and configure SpeechKit Hybrid services, you will need the Yandex Cloud CLI and a registry in Yandex Container Registry.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
Create a registry in Yandex Container Registry.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id options.
```
yc container registry create --name speechkit-hybrid
```
Result:
```
id: <registry_ID>
folder_id: <folder_ID>
name: speechkit-hybrid
status: ACTIVE
created_at: "<creation_date_and_time>"
```
Create a service account with the editor role for the selected folder.
Create an API key for the service account.
Notify the SpeechKit team of the created registry ID. All required containers will appear in your registry, and you will get the docker-compose.yaml file with the deployment settings.

System requirements

Hardware requirements for routing and licensing serversHardware requirements for routing and licensing servers

Hardware requirements for STT and TTS serversHardware requirements for STT and TTS servers

Examples of hardware calculations for STT and TTS serversExamples of hardware calculations for STT and TTS servers

Software requirementsSoftware requirements

Was the article helpful?

Hardware requirements for routing and licensing servers

Hardware requirements for STT and TTS servers

Examples of hardware calculations for STT and TTS servers

Software requirements