Yandex Foundation Models pricing policy
To estimate your service costs, see the pricing in this section.
Prices for service products are also available in the Price list.
In the management console
- YandexGPT Lite and YandexGPT Pro: 10 free requests per hour.
- YandexART: 10 free requests per day.
What goes into the cost of using Yandex Foundation Models
Billing unit
Foundation Models usage is detailed out in billing units. The cost of a billing unit is different for text generation and vectorization.
Text generation
Text generation cost is based on the total number of prompt and response tokens and depends on the parameters of your request to the generative model. Namely, the cost depends on the following:
- Model that gets the request.
- The model's operating mode.
The number of prompt and response tokens for the same text may vary depending on model.
When using models in batch processing mode, there is a minimum launch cost of 200,000 tokens.
The total number of billing units is based on the overall number of prompt and response tokens and is rounded up to a whole number.
Tokenization
The use of tokenizer (TokenizerService calls and Tokenizer methods) is not charged.
Fine-tuned models
At the Preview stage, you can fine-tune models free of charge. The use of fine-tuned models is charged according to the base model's pricing policy:
- The use of models fine-tuned in Yandex DataSphere is charged according to the YandexGPT Pro policy.
- The use of a fine-tuned YandexGPT Lite model is charged according to the YandexGPT Lite policy.
- The use of a fine-tuned Llama 8B model is charged according to the Llama 8B policy.
Text classification
The cost of text classification depends on the classification model you use and the number of tokens you provide.
- When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
- When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.
Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.
For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.
Text vectorization
The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization.
Work of assistants
At the Preview stage, you can use AI Assistant API and store files free of charge; however, you will be charged for models according to the text generation rules.
Image generation
You are charged for each generation request in YandexART. The requests are not idempotent; therefore, two requests with the same settings and generation prompt are two separate requests.
Internal server errors
You are not charged for a request that fails due to an internal server error.
Prices for the Russia region
Note
Prices for Yandex Cloud resources vary based on the region. For more information about the available regions, see Regions.
The currency you can use to pay for the resources depends on which legal entity you entered into agreement with. For more information on creating an account, see Registering an account in Yandex Cloud.
Text generation
Number | Cost, without VAT |
---|---|
1,000 units | $0.001667 |
Cost of using models in synchronous and asynchronous mode
Model |
Cost per 1,000 tokens, synchronous mode,without VAT |
Cost per 1,000 tokens, asynchronous mode,without VAT |
YandexGPT Lite |
$0.001667 |
$0.000834 |
YandexGPT Pro |
$0.010002 |
$0.010000 |
Model fine-tuned in DataSphere |
$0.010002 |
$0.005001 |
Llama 8B |
$0.001667 |
$0.000834 |
Llama 70B |
$0.010002 |
$0.005001 |
Cost of using models in batch processing mode
When using models in batch processing mode, there is a minimum launch cost of 200,000 tokens.
Model |
Cost per 1,000 tokens,batch processing mode,without VAT |
Qwen2.5 7B Instruct |
$0.000834 |
Qwen2.5 72B Instruct |
$0.005001 |
QwQ 32B Instruct |
$0.003334 |
Llama-3.3-70B-Instruct |
$0.005001 |
Llama-3.1-70B-Instruct |
$0.005001 |
DeepSeek-R1-Distill-Llama-70B |
$0.005001 |
Qwen2.5 32B Instruct |
$0.003334 |
DeepSeek-R1-Distill-Qwen-32B |
$0.003334 |
phi-4 |
$0.001667 |
Qwen2 VL 7B |
$0.000834 |
Qwen2.5 VL 7B |
$0.000834 |
DeepSeek 2 VL |
$0.003334 |
DeepSeek 2 VL Tiny |
$0.000834 |
Gemma3 1B it |
$0.000834 |
Gemma3 4B it |
$0.000834 |
Gemma3 12B it |
$0.001667 |
Gemma3 27B it |
$0.003334 |
Qwen 2.5 VL 32B Instruct |
$0.003334 |
Qwen3-0.6B |
$0.000834 |
Qwen3-1.7B |
$0.000834 |
Qwen3-4B |
$0.000834 |
Qwen3-8B |
$0.000834 |
Qwen3-14B |
$0.001667 |
Qwen3-32B |
$0.003334 |
Qwen3-30B-A3B |
$0.003334 |
Qwen3-235B-A22B |
$0.050010 |
Text classification
Service | Cost, without VAT |
---|---|
1 request (1,000 tokens) to classifier based on YandexGPT Lite | $0.001250 |
1 request (250 tokens) to classifier based on YandexGPT Pro | $0.001250 |
1 request (250 tokens) to tuned classifier | $0.001250 |
Text vectorization
Number | Cost, without VAT |
---|---|
1,000 units | $0.000083 |
Model parameters | Number of units per token | Cost per 1,000 tokens, without VAT |
---|---|---|
Embeddings | 1 | $0.000083 |
Image generation
Service | Cost, without VAT |
---|---|
1 request for YandexART image generation | $0.018333 |
Examples of YandexGPT Lite and YandexGPT Pro usage cost calculation
Calculating text generation cost
Example 1
Cost of using YandexGPT Lite for text generation with the following parameters:
- Number of prompt tokens: 225
- Number of response tokens: 525
- Model: YandexGPT Lite
- Model working mode: Synchronous
Total: ($0.001667 / 1,000 units) × 750 units = $0.001250
Example 2
Cost of using YandexGPT Pro for text generation with the following parameters:
- Number of prompt tokens: 115
- Number of response tokens: 1,500
- Model: YandexGPT Pro
- Model working mode: Asynchronous
The cost is calculated as follows:
- Number of prompt and response tokens: 115 + 1,500 = 1,615
- Price per 1,000 tokens for the YandexGPT Pro model, asynchronous mode: $0.005001
- Number of units per token for the YandexGPT Pro model, asynchronous mode: 3
- Total number of units in usage details: 1,615 × 3 = 4,845
Total: ($0.005001 / 1,000 tokens) × 1,615 tokens = $0.008077.
Example 3
Cost of using YandexGPT Pro and DataSphere for text generation with the following parameters:
- Number of prompt tokens: 1,020
- Number of response tokens: 30
- Model: YandexGPT Pro fine-tuned in DataSphere
- Model working mode: Synchronous
The cost is calculated as follows:
- Number of prompt and response tokens: 1,020 + 30 = 1,050
- Price per 1,000 tokens for the model fine-tuned in DataSphere, synchronous mode: $0.010002
- Number of units per token for the model fine-tuned in DataSphere, synchronous mode: 6
- Total number of units in usage details: 1,050 × 6 = 6,300
Total: ($0.001667 / 1,000 units) × 6,300 units = $0.010502 or ($0.010002 / 1,000 tokens) × 1,050 tokens = $0.010502.
Calculating text vectorization cost
Cost of using Yandex Foundation Models for text vectorization with the following parameter:
- Number of tokens in the request: 2,000
- $0.000083: Cost for processing 1,000 tokens
- $0.000083 / 1,000: Cost for processing one token
2,000 × ($0.000083 / 1,000) = $0.000166
Total: $0.000166.