Text generation models
Yandex AI Studio provides access to large text models from different vendors. If an out-of-the-box model is not enough, you can fine-tune some models to respond to your requests more accurately.
Models available in common instance
All basic models are subject to the update rules described in Model lifecycle. When updating models, generations available in different branches (/latest
, /rc
, and /deprecated
segments) may change. Modified models share usage quotas with their basic models.
Model and URI |
Generation |
Context |
|
YandexGPT Lite |
Deprecated 5Latest 5RC 5 |
32,000 |
Asynchronous, synchronous |
YandexGPT Pro |
Deprecated 5Latest 5RC 5.1 |
32,000 |
Asynchronous, synchronous |
Llama 8B1 |
Deprecated 3.1Latest 3.1RC 3.1 |
8,192 |
Asynchronous, synchronous |
Llama 70B1 |
Deprecated 3.3Latest 3.3RC 3.3 |
8,192 |
Asynchronous, synchronous |
Qwen3 235B |
— |
256,000 |
|
gpt-oss-120b |
— |
128,000 |
|
gpt-oss-20b |
— |
128,000 |
|
Fine-tuned models |
Depends on the basic model |
Depends on the basic model |
Asynchronous, synchronous |
Gemma3 27B |
— |
128 000 |
|
YandexART |
— |
— |
Asynchronous |
1 Llama was created by Meta. Meta is designated as an extremist organization and its activities are prohibited in Russia.
The Gemma 3 27B model is designed to process Base64-encoded images of any aspect ratio. An adaptive algorithm scales images up to 896 pixels on the largest side, ensuring that important visual details are preserved. Each image requires 256 tokens for processing.
Model lifecycle
Each model has certain lifecycle characteristics, such as the model name, branch, and release date. These characteristics allow you to precisely identify the model version. Below, you can see our rules for updating models. Refer to these rules to adjust your solutions to a new version as apporpriate.
For each model, there are three branches (in the order from the oldest to the newest one): Deprecated
, Latest
, and Release Candidate
(RC
). Each of the branches is subject to the SLA.
The RC
branch is updated as the new model is ready and may change at any time. When a model in the RC
branch is ready for general use, we announce the upcoming release both in the release notes and our Telegram community
One month after the announcement, the RC
version becomes the Latest
one, and the Latest
version is moved to the Deprecated
branch. We continue the support of the Deprecated
version for one more month, after which models in the Deprecated
and Latest
branches become identical.
Models available in batch mode
Text generation models
Model |
URI |
Context |
Qwen2.5 7B Instruct |
|
32,768 |
Qwen2.5 72B Instruct |
|
16,384 |
QwQ 32B Instruct |
|
32,768 |
Llama-3.3-70B-Instruct2 |
|
8,192 |
Llama-3.1-70B-Instruct2 |
|
8,192 |
DeepSeek-R1-Distill-Llama-70B |
|
8,192 |
Qwen2.5 32B Instruct |
|
32,768 |
DeepSeek-R1-Distill-Qwen-32B |
|
32,768 |
phi-4 |
|
16,384 |
Gemma3 1B it |
|
32,768 |
Gemma3 4B it |
|
131,072 |
Gemma3 12B it |
|
65,536 |
Gemma3 27B it |
|
32,768 |
Qwen3-0.6B |
|
32,768 |
Qwen3-1.7B |
|
32,768 |
Qwen3-4B |
|
32,768 |
Qwen3-8B |
|
32,768 |
Qwen3-14B |
|
32,768 |
Qwen3-32B |
|
32,768 |
Qwen3-30B-A3B |
|
32,768 |
Qwen3-235B-A22B |
|
32,768 |
2 Llama was created by Meta. Meta is designated as an extremist organization and its activities are prohibited in Russia.
Multimodal models
Model | URI | Context |
---|---|---|
Qwen2 VL 7BModel card |
gpt://<folder_ID>/qwen2-vl-7b-instruct/ |
4096 |
Qwen2.5 VL 7BModel card |
gpt://<folder_ID>/qwen2.5-vl-7b-instruct/ |
4096 |
Qwen 2.5 VL 32B InstructModel card |
gpt://<folder_ID>/qwen2.5-vl-32b-instruct/ |
4096 |
DeepSeek 2 VLModel card |
gpt://<folder_ID>/deepseek-vl2/ |
4096 |
DeepSeek 2 VL TinyModel card |
gpt://<folder_ID>/deepseek-vl2-tiny/ |
4096 |
Gemma3 4B itModel card |
gpt://<folder_ID>/gemma-3-4b-it/ |
4096 |
Gemma3 12B itModel card |
gpt://<folder_ID>/gemma-3-12b-it/ |
4096 |
Gemma3 27B itModel card |
gpt://<folder_ID>/gemma-3-27b-it/ |
4096 |
Accessing models
You can access text generation models of different versions in a number of ways.
When operating text generation models via Yandex Cloud ML SDK, use one of the following formats:
-
Model name, provided as a string. Only the
Latest
versions are available.# Text generation model = ( sdk.models.completions("yandexgpt") ) # Image generation model = ( sdk.models.image_generation("yandex-art") )
-
Model name and version, provided as strings in the
model_name
andmodel_version
fields, respectively.# Text generation model = ( sdk.models.completions(model_name="yandexgpt-lite", model_version="rc") ) # Image generation model = ( sdk.models.image_generation(model_name="yandex-art", model_version="latest") )
The above example explicitly specifies the
Release Candidate
of theYandexGPT Lite
model and theLatest
of theYandexART
model. -
Model URI, provided as a string containing the full URI of the required model version. You can also use this method to access fine-tuned models.
# Text generation model = ( sdk.models.completions("gpt://b1gt6g8ht345********/llama/deprecated") ) # Image generation model = ( sdk.models.image_generation("art://b1gt6g8ht345********/yandex-art/latest") )
The above example explicitly specifies the
Deprecated
version of theLlama 70B
model and theLatest
of theYandexART
model.
To access a model via the REST API or gRPC API, specify the model's URI containing the folder ID in the modelUri
field of the request body. The /latest
, /rc
, and /deprecated
segments indicate the model version. /latest
is used by default.
Examples:
-
Accessing the
Latest
versions of theYandexGPT Lite
andYandexART
models:{ "modelUri": "gpt://b1gt6g8ht345********/yandexgpt-lite/latest" ... "modelUri": "art://b1gt6g8ht345********/yandex-art/latest" }
To access the
Latest
versions, you do not need to specify the model version explicitly becauseLatest
is used by default. -
Accessing the
RC
version of theLlama 70B
model:{ "modelUri": "gpt://b1gt6g8ht345********/llama/rc" ... }