Overview of Yandex AI Studio AI models
Yandex AI Studio provides powerful capabilities for the use of generative models in business scenarios:
- Native and open-source common instance models billed based on consumed tokens.
- LoRA-based model fine-tuning.
- Out-of-the-box and tunable text classification models.
- Large selection of text and multimodal open-source models to batch-process large volumes of data with a prepaid minimum amount of tokens.
- Dedicated model instances, if you are looking to process large volumes of data with guaranteed response time.
There are two interfaces you can use to work with models: AI Playground in the management console and various APIs where you can create agents and access models directly.
Native Yandex models
Model Gallery brings to you Yandex's text and image generation models you can use for your business.
The smallest and fastest one among the text models, YandexGPT Lite excels at tasks that prioritize response speed over complex reasoning or in-depth knowledge of sophisticated subject areas. For example, you can use YandexGPT Lite to categorize incoming user messages, format texts, or summarize your meetings.
YandexGPT Pro will perform well in more complex tasks: searching knowledge bases and generating results based on the output (RAG scenario), document analysis, reporting and analytics, data extraction and auto-population of fields, forms, and CRM databases.
Alice AI LLM, Yandex's new flagship model, is not only head-to-head with YandexGPT Pro in complex tasks, but also a much better dialog partner in chat scenarios, capable of extracting information from the whole context. Alice AI LLM will be ideal for creating human-oriented AI assistants.
Yandex text models can understand around 20 languages, including English and Japanese, their primary focus is to be an effective processing engine for texts in Russian. With Yandex's proprietary tokenizer, its models are more efficient with tokens than other models out there, thus saving you money. For an example calculation of the cost of doing the same task using different models, see the pricing page.
Apart from text models, Model Gallery also features the YandexART model, a generative neural network that creates images based on a text query. YandexART uses the cascaded diffusion method to iteratively refine images from noise. You can specify the format of the final image in the mime_type parameter. Currently, the supported value is image/jpeg. By default, YandexART generates an image of 1024 x 1024 pixels. This size may increase or decrease based on the specified aspect ratio, but by no more than 10%.
Yandex text models are available through the OpenAI-compatible Completions API and Responses API, as well as a proprietary REST and gRPC text generation API.
YandexART provides a proprietary image generation API, also available as REST and gRPC.
Moreover, all the models can be reached through ML SDK and AI Playground
AI Studio operating modes
In AI Studio, you can use models in three modes: synchronous, asynchronous, or batch mode. The modes have different response times and operating logic.
In synchronous mode, the model gets your request and returns the result immediately after processing. The response delay in synchronous mode is minimal but not instant: the model still needs some time, which depends on the model and system workload. With the stream option enabled, the model sends intermediate generation results during the process. You may opt for synchronous mode if you need to maintain a chatbot dialog. In synchronous mode, models are available in AI Playground, ML SDK, via text generation APIs, and OpenAI-compatible APIs.
In asynchronous mode, the model responds to a request by sending an Operation object containing the ID of the operation it is performing. You can use the ID to learn the status of the request and later get the result of it by submitting a request to a special output endpoint (its value depends on the model). Intermediate generation results are not available in asynchronous mode. In asynchronous mode, generation usually takes longer (from a couple of minutes to several hours) than in synchronous mode but is cheaper. Use asynchronous mode if you do not need an urgent response. In asynchronous mode, some models are available in ML SDK, via text generation APIs, and image generation APIs.
Batch processing mode allows you to process a large data array in a single request to the model. Input data is provided as a dataset whose type depends on the model. For each request, AI Studio runs an individual instance of the model to process the dataset and then stops it. The result is saved as another dataset, which you can download in Parquet