Yandex Foundation Models pricing policy
To calculate the cost of using the service, see the prices in this section.
In the management console
- YandexGPT API: 10 free requests per hour.
- YandexART: 10 free requests per day.
What goes into the cost of using Yandex Foundation Models
Billing unit
Foundation Models usage is detailed out in billing units. The cost of a billing unit is different for text generation and vectorization.
Text generation
Text generation cost is based on the overall number of prompt and response tokens and depends on the YandexGPT API request parameters. Namely, the cost depends on the following:
- Model that gets a request.
- Model working mode.
The number of prompt and response tokens for the same text may vary depending on model.
The total number of billing units is based on the overall number of prompt and response tokens and is rounded up to a whole number.
Tokenization
The use of tokenizer (TokenizerService calls and Tokenizer methods) is not charged.
Fine-tuned models
The use of models fine-tuned in Yandex DataSphere is charged according to the YandexGPT Pro policy.
Text classification
At the Preview stage, the use of classifiers based on YandexGPT is free of charge.
Warning
The rules described below will take effect on December 9, 2024.
The cost of text classification depends on the classification model you use and the number of tokens you provide.
- When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
- When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.
Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.
For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.
Text vectorization
The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization.
Work of assistants
At the Preview stage, you can use AI Assistant API and store files free of charge; however, you will be charged for models according to the text generation rules.
Image generation
You are charged for each generation request in YandexART. The requests are not idempotent; therefore, two requests with the same settings and generation prompt are two separate requests.
Internal server errors
You are not charged for a request that fails due to an internal server error.
Prices for the Russia region
Warning
Prices for Yandex Cloud resources vary from region to region. For more information about the available regions, see Regions.
The currency that can be used to pay for resources depends on which legal entity the user has entered into agreement with. For more information about account registration, see Registering an account in Yandex Cloud.
Text generation
Amount | Price, including VAT |
---|---|
1,000 units | ₽0.20 |
Amount | Price, including VAT |
---|---|
1,000 units | ₸1.00 |
Model parameters | Number of unitsper token | Cost per 1,000 tokens, including VAT |
---|---|---|
YandexGPT Lite, synchronous mode | 1 | ₽0.20 |
YandexGPT Lite, asynchronous mode | 0.5 | ₽0.10 |
YandexGPT Pro, synchronous mode | 6 | ₽1.20 |
YandexGPT Pro, asynchronous mode | 3 | ₽0.60 |
Model fine-tuned in DataSphere, synchronous mode | 6 | ₽1.20 |
Model fine-tuned in DataSphere, asynchronous mode | 3 | ₽0.60 |
Llama 8b1, synchronous mode | 1 | ₽0.20 |
Llama 8b, asynchronous mode | 0.5 | ₽0.10 |
Llama 70b1, synchronous mode | 6 | ₽1.20 |
Llama 70b, asynchronous mode | 3 | ₽0.60 |
Model parameters | Number of unitsper token | Cost per 1,000 tokens, including VAT |
---|---|---|
YandexGPT Lite, synchronous mode | 1 | ₸1.00 |
YandexGPT Lite, asynchronous mode | 0.5 | ₸0.50 |
YandexGPT Pro, synchronous mode | 6 | ₸6.00 |
YandexGPT Pro, asynchronous mode | 3 | ₸3.00 |
Model fine-tuned in DataSphere, synchronous mode | 6 | ₸6.00 |
Model fine-tuned in DataSphere, asynchronous mode | 3 | ₸3.00 |
Llama 8b1, synchronous mode | 1 | ₸1.00 |
Llama 8b, asynchronous mode | 0.5 | ₸0.50 |
Llama 70b1, synchronous mode | 6 | ₸6.00 |
Llama 70b, asynchronous mode | 3 | ₸3.00 |
Text classification
Service | Cost, including VAT |
---|---|
1 request (1,000 tokens) to classifier based on YandexGPT Lite | ₽0.15 |
1 request (250 tokens) to classifier based on YandexGPT Pro | ₽0.15 |
1 request (250 tokens) to tuned classifier | ₽0.15 |
Service | Cost, including VAT |
---|---|
1 request (1,000 tokens) to classifier based on YandexGPT Lite | ₸0.75 |
1 request (250 tokens) to classifier based on YandexGPT Pro | ₸0.75 |
1 request (250 tokens) to tuned classifier | ₸0.75 |
Text vectorization
Amount | Cost, including VAT |
---|---|
1,000 units | ₽0.01 |
Amount | Price, including VAT |
---|---|
1,000 units | ₸0.05 |
Model parameters | Number of unitsper token | Total cost of processing 1,000 tokens, including VAT |
---|---|---|
Getting text embeddings | 1 | ₽0.01 |
Model parameters | Number of unitsper token | Cost of processing 1,000 tokens, including VAT |
---|---|---|
Getting text embeddings | 1 | ₸0.05 |
Image generation
Service | Cost, including VAT |
---|---|
1 request for YandexART image generation | ₽2.20 |
Service | Cost, including VAT |
---|---|
1 request for YandexART image generation | ₸11.00 |
Examples of YandexGPT API usage cost calculation
Calculating text generation cost
Example 1
Cost of using YandexGPT API for text generation with the following parameters:
- Number of prompt tokens: 225
- Number of response tokens: 525
- Model: YandexGPT Lite
- Model working mode: Synchronous
- Number of prompt and response tokens: 225 + 525 = 750
- Number of units per token for the YandexGPT Lite model, synchronous mode: 1
- Total number of units in usage details: 750
(₽0.20 / 1,000 units) × 750 units = ₽0.15
- Number of prompt and response tokens: 225 + 525 = 750
- Number of units per token for the YandexGPT Lite model, synchronous mode: 1
- Total number of units in usage details: 750
(₸1.00 / 1,000 units) × 750 units = ₸0.75
Example 2
Cost of using YandexGPT API for text generation with the following parameters:
- Number of prompt tokens: 115
- Number of response tokens: 1,500
- Model: YandexGPT Pro
- Model working mode: Asynchronous
- Number of prompt and response tokens: 115 + 1,500 = 1,615
- Price per 1,000 tokens for the YandexGPT Pro model, asynchronous mode: ₽0.60
- Number of units per token for the YandexGPT Pro model, asynchronous mode: 3
- Total number of units in usage details: 1,615 × 3 = 4,845
Total: (₽0.60 / 1,000 tokens) × 1,615 tokens = ₽0.969 rounded to ₽0.97
- Number of prompt and response tokens: 115 + 1,500 = 1,615
- Price per 1,000 tokens for the YandexGPT Pro model, asynchronous mode: ₸3.00
- Number of units per token for the YandexGPT Pro model, asynchronous mode: 3
- Total number of units in usage details: 1,615 × 3 = 4,845
Total: (₸3.00 / 1,000 tokens) × 1,615 tokens = ₸4.845 rounded to ₸4.85
Example 3
Cost of using YandexGPT API for text generation with the following parameters:
- Number of prompt tokens: 1,020
- Number of response tokens: 30
- Model: YandexGPT Pro fine-tuned in DataSphere
- Model working mode: Synchronous
- Number of prompt and response tokens: 1,020 + 30 = 1,050
- Price per 1,000 tokens for the model fine-tuned in DataSphere, synchronous mode: ₽1.20
- Number of units per token for the model fine-tuned in DataSphere, synchronous mode: 6
- Total number of units in usage details: 1,050 × 6 = 6,300
Total: (₽0.20 / 1,000 units) × 6,300 units = ₽1.26 or (₽1.20 / 1,000 tokens) × 1,050 tokens = ₽1.26
- Number of prompt and response tokens: 1,020 + 30 = 1,050
- Price per 1,000 tokens for the model fine-tuned in DataSphere, synchronous mode: ₸6.00
- Number of units per token for the model fine-tuned in DataSphere, synchronous mode: 6
- Total number of units in usage details: 1,050 × 6 = 6,300
Total: (₸1.00 / 1,000 units) × 6,300 units = ₸6.30 or (₸6.00 / 1,000 tokens) × 1,050 tokens = ₸6.30
Calculating text vectorization cost
Cost of using YandexGPT API for text vectorization with the following parameter:
- Number of tokens in the request: 2,000
- ₽0.01: Cost for processing 1,000 tokens
- ₽0.01 / 1,000: Cost for processing one token
2,000 × (₽0.01 / 1,000) = ₽0.02
Total: ₽0.02
- ₸0.05: Cost for processing 1,000 tokens
- ₸0.05 / 1,000: Cost for processing one token
2,000 × (₸0.05 / 1,000) = ₸0.10
Total: ₸0.10
1 Llama was created by Meta. Meta is designated as an extremist organization and its activities are prohibited in Russia.