Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex AI Studio
    • About Yandex AI Studio
      • Overview
      • Common instance models
      • Dedicated instance models
      • Batch processing
      • Function calling
      • Reasoning mode
      • Formatting model responses
      • Embeddings
      • Datasets
      • Fine-tuning
      • Tokens
    • Yandex Workflows
    • Quotas and limits
    • Terms and definitions
  • Compatibility with OpenAI
  • Access management
  • Pricing policy
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Dedicated instance models
  • Dedicated instance configurations
  1. Concepts
  2. Model Gallery
  3. Dedicated instance models

Dedicated instances

Written by
Yandex Cloud
Updated at November 10, 2025
  • Dedicated instance models
  • Dedicated instance configurations

This feature is in the Preview stage.

With AI Studio, you can deploy some models on a dedicated instance. Unlike a manual deployment on Yandex Compute Cloud VMs, you do not have to configure the environment or select optimal VM parameters. AI Studio provides stable, reliable, and efficient model inference and monitors its operation automatically.

Dedicated instances have a number of advantages:

  • Guaranteed performance parameters that are not affected by other users' traffic.
  • No additional quotas for requests and parallel generations. The restrictions you get only depend on the instance configuration you select.
  • Optimized model inference for efficient hardware utilization.

Dedicated instances will benefit you if you need to process massive volumes of requests without delays. A dedicated instance is not priced based on the amount of incoming and outgoing tokens: you only pay for its running time.

Dedicated instance modelsDedicated instance models

All deployed models are accessible via an API compatible with OpenAI, ML SDK, and in AI Playground. To deploy a dedicated instance, you need the ai.models.editor role or higher for the folder. To access the model, it is enough to have the ai.languageModels.user role.

Model

Context

License

Qwen 2.5 VL 32B Instruct
Model card

4,096

Apache 2.0 license

Qwen 2.5 72B Instruct
Model card

16,384

Qwen license

Gemma 3 4B it
Model card

4,096

Gemma Terms of Use

Gemma 3 12B it
Model card

4,096

Gemma Terms of Use

gpt-oss-20b
Model card

128,000

Apache 2.0 license

gpt-oss-120b
Model card

128,000

Apache 2.0 license

T-pro-it-2.0-FP8
Model card

40,000

Apache 2.0 license

Dedicated instance configurationsDedicated instance configurations

Each model may be available for deployment in several configurations: S, M, or L. Each configuration guarantees specific values ​of TTFT (time to first token), Latency (time it takes to generate a response), and TPS (tokens per second) for requests with different context lengths.

The figure below shows the dependence of latency and the number of tokens processed by the model on the number of parallel generations (Concurrency in the figure): up to a certain point, the more generations the model processes in parallel, the longer the generation will last, and the more tokens will be generated per second.

Was the article helpful?

Previous
Common instance models
Next
Batch processing
© 2025 Direct Cursus Technology L.L.C.