SpeechKit pricing policy
To calculate the cost of using the service, use the calculator on the Yandex Cloud website or see the pricing in this section.
What goes into the cost of using SpeechKit
Using speech synthesis
The cost of using SpeechKit for speech synthesis depends on the version of the API you use.
API v1
For the API v1, the cost is calculated based on the total number of characters sent to generate speech from text in a calendar month (Reporting period).
API v3
The cost of using the API v3 depends on the number of synthesis requests sent. The cost is calculated for a calendar month (Reporting period).
By default, speech synthesis requests have these limitations: 250 characters and 24 seconds. To synthesize longer phrases, you can use unsafe_mode
. In this case, you will be charged per 250 characters, e.g.:
- A request that is shorter than 250 characters is charged for as a single billing unit.
- A request that is from 250 to 500 characters long is charged for as two billing units.
- A request that is from 500 to 750 characters long is charged for as three billing units.
Empty request
The number of characters in a request is determined considering spaces and special characters. The cost of an empty request depends on the API version:
- An empty request to the API v1 is charged for as a single character.
- An empty request to the API v3 is charged for as a single billing unit.
Internal server errors
You are not charged for a request that fails due to an internal server error.
Using speech recognition
The cost of using SpeechKit for speech recognition depends on the recognition type and duration of a recognized audio fragment. The cost is calculated for a calendar month (Reporting period).
Streaming speech recognition
The cost of using SpeechKit streaming recognition is calculated based on the pricing rules for synchronous recognition.
Synchronous recognition
These rules apply to synchronous recognition and streaming mode recognition when using the API v2 and API v3.
The billing unit is a 15-second segment of a single-channel audio file. Shorter segments are rounded up (1 second becomes 15 seconds).
Warning
In streaming mode, billing begins as soon as you send a message with recognition settings. If you do not send any audio after this message, it will be treated as one consumed billing unit.
Examples:
One audio fragment that is 37 seconds long is billed as 45 seconds.
Explanation: The audio is divided into two 15-second segments and one 7-second segment. The length of the last segment is rounded up to 15 seconds. Thus, we have three segments, 15 seconds each.
Two audio fragments that are 5 and 8 seconds long are billed as 30 seconds.
Explanation: The length of each audio is rounded up to 15 seconds. Thus, we have two segments, 15 seconds each.
Asynchronous recognition with
These rules apply when using asynchronous recognition.
The billing unit is a one-second segment of two-channel audio. Shorter segments are rounded up. The number of channels is rounded up to an even number.
The minimum billable amount is 15 seconds for every pair of channels. Shorter audio fragments are billed as 15 seconds.
Examples of rounding audio length:
Length | Number of channels | Seconds charged |
---|---|---|
1 second | 1 | 15 |
1 second | 2 | 15 |
1 second | 3 | 30 |
15.5 seconds | 2 | 16 |
15.5 seconds | 4 | 32 |
Empty request
The cost of an empty request to any type of speech recognition is equal to that of a single billing unit.
Internal server errors
You are not charged for a request that fails due to an internal server error.
Prices for the Russia region
Warning
Prices for Yandex Cloud resources vary from region to region. For more information about the available regions, see Regions.
The currency that can be used to pay for resources depends on which legal entity the user has entered into agreement with. For more information about account registration, see Registering an account in Yandex Cloud.
Speech synthesis
Service | Rate for the billable unit, without VAT |
---|---|
Speech synthesis using API v1, for 1 million characters | $10.560000 |
Speech synthesis using API v3, per request | $0.001280 |
SpeechKit Brand Voice
Service | Price per unit, without VAT |
---|---|
SpeechKit Brand Voice Self Service model hosting, per month | Contact us |
SpeechKit Brand Voice Premium model hosting, per month | Contact us |
Request to SpeechKit Brand Voice Call Center model | $0.001280 |
Request to SpeechKit Brand Voice Self Service model | $0.001280 |
Request to SpeechKit Brand Voice Premium model | $0.001280 |
Speech recognition
Service | Rate for the billable unit, without VAT |
---|---|
Streaming recognition | $0.001280 |
Synchronous file recognition | $0.001280 |
Asynchronous file recognition | $0.000080 |
Asynchronous file recognition, deferred mode model | $0.000020 |
Examples of cost calculation
Speech synthesis using API v1
The cost of using SpeechKit for speech synthesis using the API v1 with the following parameters:
- Number of characters sent per month: 2,023.
The cost is calculated as follows:
2,023 × ($10.560000 / 1,000,000) = $0.020000
Total: $0,020000
Where:
- $10.560000: Cost per one million characters.
- $10.560000 / 1,000,000: Cost per one character.
Speech synthesis using API v3
The cost of using SpeechKit for speech synthesis using the API v3 with the following parameters:
- Number of requests sent: 3.
- Number of characters in requests: 150, 300, 600.
The cost is calculated as follows:
(1 + 2 + 3) × $0.001280 = $0.00768
Total: $0.00768
Where:
- 1 is the number of billing units charged for the first request of 150 characters.
- 2 is the number of billing units charged for the second request of 300 characters made using
unsafe_mode
. - 3 is the number of billing units charged for the third request of 600 characters made using
unsafe_mode
. - $0.001280: Cost per billing unit.
Streaming speech recognition
The cost of using SpeechKit for streaming speech recognition with the following parameters:
- Number of audio fragments: 2.
- Duration of audio fragments: 5 seconds, 37 seconds.
The cost is calculated as follows:
(1 + 3) × $0.001280 = $0.00512
Total: $0.00512
Where:
- 1 is the number of billing units charged for the first 5-second audio fragment rounded up to 15 seconds.
- 3 is the number of billing units charged for the second 37-second audio fragment rounded up to 45 seconds.
- $0.001280: Cost per billing unit.
Synchronous speech recognition
The cost of using SpeechKit for synchronous speech recognition with the following parameters:
- Number of audio fragments: 2.
- Duration of audio fragments: 5 seconds, 37 seconds.
The cost is calculated as follows:
(1 + 3) × $0.001280 = $0.00512
Total: $0.00512
Where:
- 1 is the number of billing units charged for the first 5-second audio fragment rounded up to 15 seconds.
- 3 is the number of billing units charged for the second 37-second audio fragment rounded up to 45 seconds.
- $0.001280: Cost per billing unit.
Asynchronous speech recognition
The cost of using SpeechKit for asynchronous speech recognition with the following parameters:
- Number of audio fragments: 4.
- Duration of audio fragments: 5 seconds, 5 seconds, 15.5 seconds, 15.5 seconds.
- Number of channels in audio fragments: 1, 3, 2, 4.
The cost is calculated as follows:
(15 + 30 + 16 + 32) × $0.000128 = $0.011904
Total: $0.011904
Where:
- 15 is the number of billing units charged for the first single-channel 5-second audio fragment rounded up to 2 channels and 15 seconds.
- 30 is the number of billing units charged for the second 3-channel 5-second audio fragment rounded up to 4 channels and 15 seconds.
- 16 is the number of billing units charged for the third 2-channel 15.5-second audio fragment rounded up to 16 seconds.
- 32 is the number of billing units charged for the fourth 4-channel 15.5-second audio fragment rounded up to 16 seconds.
- $0.000128: Cost per billing unit.
Asynchronous speech recognition in deferred mode
The cost of using SpeechKit for asynchronous speech recognition in deferred mode with the following parameters:
- Number of audio fragments: 3.
- Duration of audio fragments: 2 seconds, 14 seconds, 19.5 seconds.
- Number of channels in audio fragments: 2, 3, 4.
The cost is calculated as follows:
(15 + 30 + 40) × $0.000032 = $0.00272
Total: $0.00272
Where:
- 15 is the number of billing units charged for the first 2-channel 2-second audio fragment rounded up to 15 seconds.
- 30 is the number of billing units charged for the second 3-channel 14-second audio fragment rounded up to 4 channels and 15 seconds.
- 40 is the number of billing units charged for the third 4-channel 19.5-second audio fragment rounded up to 20 seconds.
- $0.000032: Cost per billing unit.