Yandex AI Studio pricing policy
To estimate your service costs, see the pricing in this section.
The prices for service products are also available in the price list.
Note
Currency of Service rates (prices) depends on the company you made a contract with:
- Prices in US dollars are applicable to customers of Iron Hive doo Beograd (Serbia) or Direct Cursus Technology L.L.C. (Dubai).
- Prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.
All prices below do not include VAT.
What goes into the cost of using Yandex AI Studio
In Yandex Cloud Billing, AI Studio usage is detailed in billing units. The billing unit value is different for text generation and vectorization.
Text generation
Text generation cost is based on the total number of prompt and response tokens and depends on the parameters of your request to the generative model. Namely, the cost depends on the following:
- Model that gets the request.
- The model's operating mode.
The number of prompt and response tokens for the same text may vary depending on the model.
With models in batch mode, the minimum cost per run is 200,000 tokens.
The total number of billing units is based on the overall number of prompt and response tokens and is rounded up to an integer.
Tokenization
The use of tokenizer (TokenizerService calls and Tokenizer methods) is free of charge.
Fine-tuned models
At the Preview stage, you can fine-tune models free of charge. The use of fine-tuned models is charged according to the base model's pricing policy:
- The use of a fine-tuned YandexGPT Lite model is charged according to the YandexGPT Lite policy.
- The use of a fine-tuned Llama 8B model is charged according to the Llama 8B policy1.
1 Llama was created by Meta. Meta is designated as an extremist organization and its activities are prohibited in Russia.
Dedicated inctsnces
The cost of running a dedicated instance depends on the model and the chosen configuration. The work of a dedicated instance is charged per second, rounded up to the billing unit. At the same time, the time of hardware maintenance and model deployment is not charged.
Prices are shown for 1 hour of use. Billing occurs per second.
Text classification
The cost of text classification depends on the classification model you use and the number of tokens you provide.
- When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
- When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.
Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.
For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.
Text vectorization
The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization.
Assistants
At the Preview stage, you can use AI Assistant API and store files free of charge; however, you will be charged for models according to the text generation rules.
Using Voice Agents
The cost of using voice agents consists of the cost of speech recognition (incoming audio), the cost of speech synthesis (outgoing audio), and the cost of text generation using the speech-realtime-250923 model.
Image generation
You are charged for each generation request in YandexART. Requests are not idempotent; therefore, two requests with the same settings and generation prompt are two separate requests.
Internal server errors
You are not charged for a request that fails due to an internal server error.
Prices for the Russia region
Note
Yandex Cloud resources are priced differently in different regions. For more information about the available regions, see Regions.
Your payment currency is determined by your contracting legal entity. For more information on creating an account, see Registering an account in Yandex Cloud.
Text generation
Number | Price, without VAT |
---|---|
1,000 units | $0.001667 |
Cost of using models in synchronous and asynchronous mode
Model |
Price per 1,000 tokens, synchronous mode,without VAT |
Price per 1,000 tokens, asynchronous mode,without VAT |
YandexGPT Lite |
$0.001667 |
$0.000834 |
YandexGPT Pro |
$0.010002 |
$0.010000 |
YandexGPT Pro 5.1 |
$0.003334 1 |
$0.001667 1 |
Llama 8B |
$0.001667 |
$0.000834 |
Llama 70B |
$0.010002 |
$0.005001 |
Qwen3 235B |
$0.004168 1 |
— |
gpt-oss-120b |
$0.002501 |
— |
gpt-oss-20b |
$0.000834 |
— |
Gemma3 27B |
$0.003334 1 |
— |
1 The price is based on the current 50% discount.
Cost of using models in batch mode
With models in batch mode, the minimum cost per run is 200,000 tokens.
Model |
Price per 1,000 tokens,batch processing mode,without VAT |
Qwen2.5 7B Instruct |
$0.000834 |
Qwen2.5 72B Instruct |
$0.005001 |
QwQ 32B Instruct |
$0.003334 |
Llama-3.3-70B-Instruct |
$0.005001 |
Llama-3.1-70B-Instruct |
$0.005001 |
DeepSeek-R1-Distill-Llama-70B |
$0.005001 |
Qwen2.5 32B Instruct |
$0.003334 |
DeepSeek-R1-Distill-Qwen-32B |
$0.003334 |
phi-4 |
$0.001667 |
Qwen2 VL 7B |
$0.000834 |
Qwen2.5 VL 7B |
$0.000834 |
DeepSeek 2 VL |
$0.003334 |
DeepSeek 2 VL Tiny |
$0.000834 |
Gemma3 1B it |
$0.000834 |
Gemma3 4B it |
$0.000834 |
Gemma3 12B it |
$0.001667 |
Gemma3 27B it |
$0.003334 |
Qwen 2.5 VL 32B Instruct |
$0.003334 |
Qwen3-0.6B |
$0.000834 |
Qwen3-1.7B |
$0.000834 |
Qwen3-4B |
$0.000834 |
Qwen3-8B |
$0.000834 |
Qwen3-14B |
$0.001667 |
Qwen3-32B |
$0.003334 |
Qwen3-30B-A3B |
$0.003334 |
Qwen3-235B-A22B |
$0.050010 |
Dedicated instances
Prices are shown for 1 hour of use. Billing occurs per second.
The price per 1 unit for a dedicated instance is $0.0083333 without VAT.
Model | Price per 1 hour,S configuration, without VAT |
Price per 1 hour,M configuration, without VAT |
Price per 1 hourL configuration, without VAT |
---|---|---|---|
Qwen 2.5 VL 32B Instruct | $6.70 | $13.40 | $20.10 |
Qwen 2.5 72B Instruct | $6.70 | $13.40 | $20.10 |
Gemma 3 4B it | $3.35 | $6.70 | $10.05 |
Gemma 3 12B it | $3.35 | $6.70 | $10.05 |
gpt-oss-20b | $3.35 | $6.70 | $10.05 |
gpt-oss-120b | $6.70 | $13.40 | $20.10 |
T-pro-it-2.0-FP8 | $6.20 | $12.40 | $18.60 |
Text classification
Service | Price, without VAT |
---|---|
1 request (1,000 tokens) to classifier based on YandexGPT Lite | $0.001250 |
1 request (250 tokens) to classifier based on YandexGPT Pro | $0.001250 |
1 request (250 tokens) to tuned classifier | $0.001250 |
Text vectorization
Number | Price, without VAT |
---|---|
1,000 units | $0.000083 |
Model parameters | Number of units per token | Price per 1,000 tokens, without VAT |
---|---|---|
Embeddings | 1 | $0.000083 |
Using Voice Agents
Prices are shown for 1 minute of use. Billing occurs per second.
Note
The prices are effective until November 24, 2025.
Service | Price per unit of tariffing, without VAT |
---|---|
Speech recognition, per 1 minute | $0.0065 |
Speech synthesis, per 1 minute | $0.005 |
Text generation, per 1000 tokens | $0.003334 |
Image generation
Service | Price, without VAT |
---|---|
1 request for YandexART image generation | $0.018333 |
Examples of the YandexGPT Lite and YandexGPT Pro usage cost calculation
Calculating the text generation cost
Example 1
Cost of using YandexGPT Lite for text generation with the following parameters:
- Number of prompt tokens: 225
- Number of response tokens: 525
- Model: YandexGPT Lite
- Model working mode: Synchronous
Total: ($0.001667 / 1,000 units) × 750 units = $0.001250
Example 2
Cost of using YandexGPT Pro for text generation with the following parameters:
- Number of prompt tokens: 115
- Number of response tokens: 1,500
- Model: YandexGPT Pro
- Model working mode: Asynchronous
The cost is calculated as follows:
- Number of prompt and response tokens: 115 + 1,500 = 1,615.
- Price per 1,000 tokens for the YandexGPT Pro model, asynchronous mode: $0.005001.
- Number of units per token for the YandexGPT Pro model, asynchronous mode: 3.
- Total number of units in usage details: 1,615 × 3 = 4,845.
Total: ($0.005001 / 1,000 tokens) × 1,615 tokens = $0.008077.
Example 3
Cost of using YandexGPT Pro and DataSphere for text generation with the following parameters:
- Number of prompt tokens: 1,020
- Number of response tokens: 30
- Model: YandexGPT Pro fine-tuned in DataSphere
- Model working mode: Synchronous
The cost is calculated as follows:
- Number of prompt and response tokens: 1,020 + 30 = 1,050.
- Price per 1,000 tokens for model fine-tuned in DataSphere, asynchronous mode: $0.010002.
- Number of units per token for model fine-tuned in DataSphere, synchronous mode: 6.
- Total number of units in usage details: 1,050 × 6 = 6,300.
Total: ($0.001667 / 1,000 units) × 6,300 units = $0.010502 or ($0.010002 / 1000 tokens) × 1,050 tokens = $0.010502.
Calculating the text vectorization cost
Cost of using Yandex AI Studio for text vectorization with the following parameter:
- Number of tokens in the request: 2,000
- $0.000083: Cost of processing per 1,000 tokens.
- $0.000083 / 1,000: Cost of processing per one token.
2,000 × ($0.000083 / 1,000) = $0.000166
Total: $0.000166.
Example of the Voice Agents usage cost calculation
Cost of using voice agent the speech-realtime-250923 model with the following parameters:
- Incoming audio: 30 seconds.
- Outgoing audio: 1 minute.
- Number of tokens in the request: 2000.
$0.00334 × 2 + $0.0065 / 2 + $0.005 × 1 = $0.00668 + $0.0325 + $0.005
Total: $0.04418.
Where:
- $0.0033: Cost of processing per 1000 tokens.
- $0.0033 × 2: Cost of processing per 2000 tokens.
- $0.0065: Cost of processing per 1 minute of the incoming audio.
- $0.0065 / 2: Cost of processing per 30 srconds of the incoming audio.
- $0.005 × 1: Cost of processing per 1 minute of the outgoing audio.