System requirements
To install SpeechKit Hybrid, you need a Linux server with Docker Engine support. For a list of supported operating systems, see the Docker official documentation
Warning
The CPU must support the AVX2
On Linux, to check whether your host supports AVX2, run:
grep -q avx2 /proc/cpuinfo && echo AVX2 || echo No AVX2
Hardware requirements
Recommended hardware requirements for running SpeechKit Hybrid containers estimated values of SpeechKit Hybrid specifications are given for reference based on hardware using the NVIDIA 535 driver. The actual values may change after containers are updated and new features are added.
The data in the tables is for the Russian language only:
Operation mode | Guaranteed SPS1 |
RAM per card, GB |
HDD per card, GB |
Intel Gold 6230R physical cores per card |
Intel Gold 6230R logical cores per card |
---|---|---|---|---|---|
Speech recognition | |||||
Streaming recognition | 50 | 64 | 200 | 8 | 16 |
Audio file recognition | 250 | 64 | 200 | 8 | 16 |
Speech synthesis | |||||
Speech synthesis | 80 | 64 | 200 | 8 | 16 |
Operation mode | Guaranteed SPS1 |
RAM per card, GB |
HDD per card, GB |
Intel Gold 6230R physical cores per card |
Intel Gold 6230R logical cores per card |
---|---|---|---|---|---|
Speech recognition | |||||
Streaming recognition | 110 | 64 | 200 | 8 | 16 |
Audio file recognition | 500 | 64 | 200 | 8 | 16 |
Speech synthesis | |||||
Speech synthesis | 200 | 64 | 200 | 8 | 16 |
Operation mode | Guaranteed SPS1 |
RAM per card, GB |
HDD per card, GB |
Intel Gold 6230R physical cores per card |
Intel Gold 6230R logical cores per card |
---|---|---|---|---|---|
Speech recognition | |||||
Streaming recognition | 245 | 64 | 200 | 14 | 28 |
Audio file recognition | 1,000 | 64 | 200 | 14 | 28 |
Speech synthesis | |||||
Speech synthesis | 480 | 64 | 200 | 14 | 28 |
1 Seconds per second (SPS): Number of seconds of recognized or synthesized text per runtime second.
Sample calculations of required hardware
The number of cards required for speech recognition or speech-to-text synthesis depends on the SPS value. Use the following formula for calculation:
User SPS = X × Y
Where:
- X is the percentage of conversation with recognition enabled. If interruptions need to be factored in, X = 1.
- Y is the number of concurrent calls.
User SPS = X / Y
Where:
- X is the duration of audio to recognize, in seconds.
- Y is the time required for audio recognition, in seconds.
User SPS = X × (Y / 10)
Where:
- X is the expected number of requests per second.
- Y is the average request length in characters.
The number of cards is calculated as follows:
User SPS / guaranteed card SPS
The resulting value is rounded up to the nearest integer.
To get the required amount of RAM, HDDs, and cores, multiply the table values by the number of cards.
Software requirements
To install and configure SpeechKit Hybrid services: