Yandex AI Studio pricing policy

Written by

Yandex Cloud

Updated at February 20, 2026

Prices for the Russia region

To estimate your service costs, see the pricing in this section.

The prices for service products are also available in the price list.

Note

Currency of Service rates (prices) depends on the company you made a contract with:

Prices in US dollars are applicable to customers of Iron Hive doo Beograd (Serbia) or Direct Cursus Technology L.L.C. (Dubai).
Prices in Russian roubles are applicable to customers of Yandex.Cloud LLC.

All prices below do not include VAT.

Prices for the Russia region

Note

Yandex Cloud resources are priced differently in different regions. For more information about the available regions, see Regions.

Your payment currency is determined by your contracting legal entity. For more information on creating an account, see Registering an account in Yandex Cloud.

Model Gallery

Warning

The pricing provided below for the common instance and batch processing models are effective until March 3, 2026. Starting March 3, the new pricing policy will apply.

The cost of using Model Gallery models depends on:

Model's operating mode.
Number of input and output tokens. The token count in the same text may vary from one model to another. Here is a cost calculation example for processing the same text in synchronous mode using different models.

Yandex Cloud Billing breaks down the use of Model Gallery models in billing units. The total number of billing units is rounded up to an integer.

Number	Price, without VAT
1,000 units	$0.001667

Using common instance models until March 3, 2026

Model	Price per 1,000 input tokens, synchronous mode, without VAT	Price per 1,000 output tokens, synchronous mode, without VAT	Price per 1,000 input tokens, asynchronous mode, without VAT	Price per 1,000 output tokens, asynchronous mode, without VAT
Alice AI LLM	$0.0042	$0.0167	$0.0021	$0.0083
YandexGPT Pro 5.1	$0.0067	$0.0067	$0.003361	$0.003361
YandexGPT Pro 5	$0.01	$0.01	$0.0050	$0.0050
YandexGPT Lite	$0.001667	$0.001667	$0.000834	$0.000834
Qwen3 235B	$0.0042 ¹	$0.0042 ¹	—	—
gpt-oss-120b	$0.0025	$0.0025	—	—
gpt-oss-20b	$0.000834	$0.000834	—	—
Gemma3 27B	$0.003334 ¹	$0.003334 ¹	—	—
speech-realtime-250923	$0.0067	$0.0067	—	—

¹ The price is based on the current 50% discount.

Example of cost calculation for a model in synchronous mode

Request parameters:

Instruction: «Проанализируй предоставленный текст и выполни его комплексную грамотную редактуру. Твоя задача — устранить любые грамматические, орфографические, стилистические и пунктуационные ошибки, не изменяя при этом исходного смысла и структуры высказывания. Сохраняй оригинальный порядок слов и не вноси дополнительных уточнений, пояснений или переформулировок, которые могут изменить тон или содержание текста. Внесённые правки должны быть минимально необходимыми для того, чтобы предложение стало корректным с точки зрения русского языка. Также убедись, что все слова употреблены в нормативной форме, а знаки препинания соответствуют литературным стандартам»

Request text: «Нейрасети оптемезируют бизнес-працесы розгружают техпадержку ускаряют праверку документов аналис и абработку данных генирируют отчёты за минуты и прогназируют спрос.»

The model's response: «Нейросети оптимизируют бизнес‑процессы: разгружают техподдержку, ускоряют проверку документов, анализ и обработку данных, генерируют отчёты за минуты и прогнозируют спрос.»
Количество символов на вход: 782

	Alice AI LLM	YandexGPT Pro 5.1	Qwen3 235B
Tokens per request	164	164	248
Tokens per response	22	22	39
Request cost	$0.000683	$0.000547	$0.001034
Response cost	$0.000367	$0.000073	$0.000163
Total	$0.001050	$0.000620	$0.0004	$0.0012

Example of cost calculation for a model in asynchronous mode

Request parameters:

Number of prompt tokens: 115

Number of response tokens: 1,500

Model: YandexGPT Pro

Model operating mode: Asynchronous

The cost is calculated as follows:

Number of prompt and response tokens: 115 + 1,500 = 1,615.
Price per 1,000 tokens for the YandexGPT Pro model, asynchronous mode: $0.0050.
Number of units per token for the YandexGPT Pro model, asynchronous mode: 3.
Total number of units in usage details: 1,615 × 3 = 4,845.

Total: ($0.0050 / 1,000 tokens) × 1,615 tokens = $0.0081.

Using models in batch mode until March 3, 2026

With models in batch mode, the minimum cost per run is 200,000 tokens.

Model	Price per 1,000 tokens, batch processing mode, without VAT
Qwen2.5 7B Instruct	$0.000834
Qwen2.5 72B Instruct	$0.0050
QwQ 32B Instruct	$0.003334
Llama-3.3-70B-Instruct	$0.0050
Llama-3.1-70B-Instruct	$0.0050
DeepSeek-R1-Distill-Llama-70B	$0.0050
Qwen2.5 32B Instruct	$0.003334
DeepSeek-R1-Distill-Qwen-32B	$0.003334
phi-4	$0.001667
Qwen2 VL 7B	$0.000834
Qwen2.5 VL 7B	$0.000834
DeepSeek 2 VL	$0.003334
DeepSeek 2 VL Tiny	$0.000834
Gemma3 1B it	$0.000834
Gemma3 4B it	$0.000834
Gemma3 12B it	$0.001667
Gemma3 27B it	$0.003334
Qwen 2.5 VL 32B Instruct	$0.003334
Qwen3-0.6B	$0.000834
Qwen3-1.7B	$0.000834
Qwen3-4B	$0.000834
Qwen3-8B	$0.000834
Qwen3-14B	$0.001667
Qwen3-32B	$0.003334
Qwen3-30B-A3B	$0.003334
Qwen3-235B-A22B	$0.05

Using Model Gallery models starting March 3, 2026

Warning

This pricing for the common instance models and batch processing is effective as of March 3, 2026.

The cost of using the models depends on the operating mode and the number of tokens for different consumption types:

Input query tokens.
Output model response tokens.
Cached tokens, if certain information is re-used without additional computation, such as instructions for a model.
Tool tokens provided to the model as a result of invoking any tool.

Caching is enabled automatically where possible and applicable. Caching is not guaranteed and does not apply to output tokens.

Tool tokens include all uncached tokens stored in the message history at the time the tool's results were transmitted. Tool tokens are calculated only for AI Studio built-in tools and do not apply to the results of custom functions. Use of tools is charged separately.

Synchronous mode

Model	Price per 1,000 input tokens, without VAT	Price per 1,000 cached tokens, without VAT	Price per 1,000 tool tokens, without VAT	Price per 1,000 output tokens, without VAT
Alice AI LLM	$0.0041	$0.0041	$0.001066	$0.0098
YandexGPT Pro 5.1	$0.0066	$0.0066	$0.001639	$0.0066
YandexGPT Pro 5.	$0.0098	$0.0098	$0.0098	$0.0098
YandexGPT Lite	$0.001639	$0.001639	$0.001639	$0.001639
Qwen3 235B	$0.0041	$0.0041	$0.0041	$0.0041
gpt-oss-120b	$0.002459	$0.002459	$0.002459	$0.002459
gpt-oss-20b	$0.000820	$0.000820	$0.000820	$0.000820
Gemma3 27B	$0.0033	$0.0033	$0.0033	$0.0033
speech-realtime-250923	$0.0066	$0.001639	$0.001639	$0.0066

Asynchronous mode

Model	Price per 1,000 input tokens, without VAT	Price per 1,000 output tokens, without VAT
Alice AI LLM	$0.0021	$0.0083
YandexGPT Pro 5.1	$0.003361	$0.003361
YandexGPT Pro 5	$0.0050	$0.0050
YandexGPT Lite	$0.000834	$0.000834

Batch mode

With models in batch mode, the minimum cost per run is 200,000 tokens.

Model	Price per 1,000 input tokens, without VAT	Price per 1,000 output tokens, without VAT
Qwen2.5 7B Instruct	$0.000834	$0.000834
Qwen2.5 72B Instruct	$0.0050	$0.0050
QwQ 32B Instruct	$0.003334	$0.003334
Llama-3.3-70B-Instruct	$0.0050	$0.0050
Llama-3.1-70B-Instruct	$0.0050	$0.0050
DeepSeek-R1-Distill-Llama-70B	$0.0050	$0.0050
Qwen2.5 32B Instruct	$0.003334	$0.003334
DeepSeek-R1-Distill-Qwen-32B	$0.003334	$0.003334
phi-4	$0.001667	$0.001667
Qwen2 VL 7B	$0.000834	$0.000834
Qwen2.5 VL 7B	$0.000834	$0.000834
DeepSeek 2 VL	$0.003334	$0.003334
DeepSeek 2 VL Tiny	$0.000834	$0.000834
Gemma3 1B it	$0.000834	$0.000834
Gemma3 4B it	$0.000834	$0.000834
Gemma3 12B it	$0.001667	$0.001667
Gemma3 27B it	$0.003334	$0.003334
Qwen 2.5 VL 32B Instruct	$0.003334	$0.003334
Qwen3-0.6B	$0.000834	$0.000834
Qwen3-1.7B	$0.000834	$0.000834
Qwen3-4B	$0.000834	$0.000834
Qwen3-8B	$0.000834	$0.000834
Qwen3-14B	$0.001667	$0.001667
Qwen3-32B	$0.003334	$0.003334
Qwen3-30B-A3B	$0.003334	$0.003334
Qwen3-235B-A22B	$0.05	$0.05

Dedicated instances

The cost of operation of a dedicated instance depends on the model and selected configuration. Dedicated instances are charged per second with rounding up to a billing unit. However, there is no charge for hardware maintenance and model deployment time.

Prices are shown for 1 hour of use. Billing occurs per second.

The price per 1 unit for a dedicated instance is $0.0083333 without VAT.

Model	Price per 1 hour, S configuration, without VAT	Price per 1 hour, M configuration, without VAT	Price per 1 hour L configuration, without VAT
Qwen 2.5 VL 32B Instruct	$6.70	$13.40	$20.10
Qwen 2.5 7B Instruct	$6.70	$13.40	$20.10
Gemma 3 4B it	$3.35	$6.70	$10.05
Gemma 3 12B it	$3.35	$6.70	$10.05
T-pro-it-2.0-FP8	$6.20	$12.40	$18.60

Fine-tuning

At the Preview stage, you can fine-tune models free of charge. A fine-tuned YandexGPT Lite model will cost the same as the basic YandexGPT Lite model.

Text tokenization

The use of tokenizer (TokenizerService calls and Tokenizer methods) is free of charge.

Text vectorization

The cost of text vectorization (getting text embeddings) depends on the size of the text submitted for vectorization. Yandex Cloud Billing breaks down the creation of embeddings in vectorization units. One unit equals one token.

Model	Price per 1,000 tokens, without VAT
Embeddings	$0.000083

Example of cost calculation for text vectorization

The cost of vectorizing a text of 2,000 tokens will be:

$0.000083: Cost of processing 1,000 tokens.
$0.000083 / 1,000: Cost of processing one token.

2,000 × ($0.000083 / 1,000) = $0.000166

Total: $0.000166.

Text classifications

The cost of text classification depends on the classification model you use and the number of tokens you provide.

When classifying with YandexGPT Lite, a billing unit is a request of up to 1,000 tokens.
When classifying with YandexGPT Pro and fine-tuned classifiers, a billing unit is a request of up to 250 tokens.

Requests with less than one billing unit are rounded up to the next integer. Large texts are billed as multiple requests with rounding up.

For example, classifying a text of 770 tokens with YandexGPT Lite will be billed as a single request, i.e., as one billing unit.
The same 770-token text classified with YandexGPT Pro or a fine-tuned classifier will be billed as four requests.

Service	Price, without VAT
1 request (1,000 tokens) to classifier based on YandexGPT Lite	$0.001250
1 request (250 tokens) to classifier based on YandexGPT Pro	$0.001250
1 request (250 tokens) to tuned classifier	$0.001250

Image generation

You are charged for each generation request in YandexART. Requests are not idempotent; therefore, two requests with the same settings and generation prompt are considered as two separate requests.

Service	Price, without VAT
1 request for YandexART image generation	$0.0183

Agent Atelier

Voice agents

The cost of using voice agents consists of the following:

Cost of speech recognition (incoming audio).
Cost of speech synthesis (outgoing audio).
Cost of text generation using the speech-realtime-250923 model.
Cost of tool invocation.

Service	Price per unit of tariffing, without VAT
Incoming audio, per 1 second	$0.000217
Outgoing audio, per 1 second	$0.000167

Example of cost calculation for a voice agent

Cost of using a voice agent per a 60-second session, where:

Input audio: 60 seconds

Output audio: 20 seconds

Number of generated tokens: 2,000

$0.0067 × 2 + $0.000216 × 60 + $0.00166 × 20 = $0.0133 + $0.0130 + $0.0332

Total: $0.06.

Where:

$0.0067: Cost of processing 1,000 tokens.
$0.0067 × 2: Cost of processing 2,000 tokens.
$0.000217: Cost of processing 1 second of incoming audio.
$0.000217 × 60: Cost of processing 60 seconds of incoming audio.
$0.000167: Cost of processing 1 second of outgoing audio.
$0.000167 × 20: Cost of processing 20 seconds of outgoing audio.

Text-based agents

The cost of using text-based agents consists of the following:

Consumption of tokens as per the pricing plans of the Model Gallery models.
Cost of tool invocation.

Invoking tools in agents

Note

The cost of File Search invocations will change on March 12, 2026.

Service	Price per 1,000 requests, without VAT
Web Search tool	$7.50
File Search tool, until March 12, 2026	Free of charge
File Search tool, from March 12, 2026	$2.46
Code Interpreter tool	Free of charge
MCP tool	Free of charge

AI Search

Until March 12, 2026, storing search indexes and files uploaded to AI Studio will be free or charge.

Note

The pricing policy below comes into effect on March 12, 2026.

The search index size is rounded up to the nearest whole gigabyte.

In all calculations: 1 GB = 2³⁰ bytes, 1 MB = 2²⁰ bytes.

Service	Price per day per 1 GB, without VAT
Search index storage	$0.0869
AI Studio file storage	Free of charge

MCP Hub

Note

This feature is at the Preview stage.

At the Preview stage, MCP servers are free of charge. However, you may still be charged for tools created in MCP servers, such as Yandex Cloud Functions function invocations.

When using external APIs, such as Kontur.Focus or amoCRM, you are charged directly by our respective partner.

Internal server errors

You are not charged for a request that fails due to an internal server error.

Yandex AI Studio pricing policy

Prices for the Russia regionPrices for the Russia region

Model GalleryModel Gallery

Using common instance models until March 3, 2026Using common instance models until March 3, 2026

Using models in batch mode until March 3, 2026Using models in batch mode until March 3, 2026

Using Model Gallery models starting March 3, 2026Using Model Gallery models starting March 3, 2026

Synchronous modeSynchronous mode

Asynchronous modeAsynchronous mode

Batch modeBatch mode

Dedicated instancesDedicated instances

Fine-tuningFine-tuning

Text tokenizationText tokenization

Text vectorizationText vectorization

Text classificationsText classifications

Image generationImage generation

Agent AtelierAgent Atelier

Voice agentsVoice agents

Text-based agentsText-based agents

Invoking tools in agentsInvoking tools in agents

AI SearchAI Search

MCP HubMCP Hub

Internal server errorsInternal server errors

Was the article helpful?

Prices for the Russia region

Model Gallery

Using common instance models until March 3, 2026

Using models in batch mode until March 3, 2026

Using Model Gallery models starting March 3, 2026

Synchronous mode

Asynchronous mode

Batch mode

Dedicated instances

Fine-tuning

Text tokenization

Text vectorization

Text classifications

Image generation

Agent Atelier

Voice agents

Text-based agents

Invoking tools in agents

AI Search

MCP Hub

Internal server errors