System requirements
To install SpeechKit Hybrid, you need a Linux server with Docker Engine support. For a list of supported operating systems, see the Docker official documentation
Warning
The CPU must support the AVX2
On Linux, to check whether your host supports AVX2, run:
grep -q avx2 /proc/cpuinfo && echo AVX2 || echo No AVX2
Hardware requirements
Recommended hardware requirements for running SpeechKit Hybrid containers estimated values of SpeechKit Hybrid specifications are given for reference based on hardware using the NVIDIA 535 driver. The actual values may change after containers are updated and new features are added.
The data in the tables is for the Russian language only:
Operation mode | Guaranteed SPS1 |
RAM per card, GB |
HDD per card, GB |
Intel Gold 6230R physical cores per card |
Intel Gold 6230R logical cores per card |
---|---|---|---|---|---|
Speech recognition | |||||
Streaming recognition | 50 | 64 | 200 | 8 | 16 |
Audio file recognition | 250 | 64 | 200 | 8 | 16 |
Speech synthesis | |||||
Speech synthesis | 80 | 64 | 200 | 8 | 16 |
Operation mode | Guaranteed SPS1 |
RAM per card, GB |
HDD per card, GB |
Intel Gold 6230R physical cores per card |
Intel Gold 6230R logical cores per card |
---|---|---|---|---|---|
Speech recognition | |||||
Streaming recognition | 110 | 64 | 200 | 8 | 16 |
Audio file recognition | 500 | 64 | 200 | 8 | 16 |
Speech synthesis | |||||
Speech synthesis | 200 | 64 | 200 | 8 | 16 |
Operation mode | Guaranteed SPS1 |
RAM per card, GB |
HDD per card, GB |
Intel Gold 6230R physical cores per card |
Intel Gold 6230R logical cores per card |
---|---|---|---|---|---|
Speech recognition | |||||
Streaming recognition | 245 | 64 | 200 | 14 | 28 |
Audio file recognition | 1,000 | 64 | 200 | 14 | 28 |
Speech synthesis | |||||
Speech synthesis | 480 | 64 | 200 | 14 | 28 |
1 Seconds per second (SPS): Number of seconds of recognized or synthesized text per runtime second.
Sample calculations of required hardware
The number of cards required for speech recognition or speech-to-text synthesis depends on the SPS value. Use the following formula for calculation:
User SPS = X × Y
Where:
- X is the percentage of conversation with recognition enabled. If interruptions need to be factored in, X = 1.
- Y is the number of concurrent calls.
User SPS = X / Y
Where:
- X is the duration of audio to recognize, in seconds.
- Y is the time required for audio recognition, in seconds.
User SPS = X × (Y / 10)
Where:
- X is the expected number of requests per second.
- Y is the average request length in characters.
The number of cards is calculated as follows:
User SPS / guaranteed card SPS
The resulting value is rounded up to the nearest integer.
To get the required amount of RAM, HDDs, and cores, multiply the table values by the number of cards.
Software requirements
A dedicated SpeechKit Hybrid server should support running CUDA® 11.4 containers and higher
To install and configure SpeechKit Hybrid services, you will need the Yandex Cloud CLI and a registry in Yandex Container Registry.
-
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
-
Create a registry in Yandex Container Registry.
The folder specified in the CLI profile is used by default. You can specify a different folder using the
--folder-name
or--folder-id
parameter.yc container registry create --name speechkit-hybrid
Result:
id: <registry_ID> folder_id: <folder_ID> name: speechkit-hybrid status: ACTIVE created_at: "<creation_date_and_time>"
-
Create a service account with the
editor
role for the selected folder. -
Create an API key for the service account.
-
Notify the SpeechKit team of the created registry ID. All the required containers will appear in your registry, and you will be provided with the
docker-compose.yaml
file that contains the deployment settings.