Quotas and limits in Yandex Foundation Models
Written by
Updated at December 4, 2024
YandexGPT API has the following limits:
- Quotas are organizational restrictions that can be changed by technical support on request.
- Limits are technical limitations due to Yandex Cloud architectural features. The limits cannot be changed.
If you need more resources, contact support
Quotas
Type of limit | Value |
---|---|
Text vectorization | |
Number of text vectorization requests per second | 10 |
Text generation | |
Number of concurrent generations, synchronous mode | 10 |
Number of concurrent generations, YandexGPT Pro 32k | 1 |
Number of requests per second, asynchronous mode (request) | 10 |
Number of requests per second, asynchronous mode (getting a response) | 50 |
Number of requests per hour, asynchronous mode (request) | 5000 |
Number of requests per hour, YandexGPT Pro 32k, synchronous mode (request) | 100 |
Number of tokenization requests per second | 50 |
Text classification | |
Number of text classification requests per second | 1 |
Image generation | |
Number of generation requests per minute | 500 |
Number of generation requests per day | 5,000 |
Number of result requests per second | 50 |
Limits
Type of limit | Value |
---|---|
Period to store results of asynchronous requests on the server | 3 days |
Text vectorization | |
Number of input tokens | 2,000 |
Output vector size | 256 |
Text generation | |
Maximum number of tokens in response via API | 2,000 |
Maximum number of tokens per response in the management console |
1,000 |
Total number of tokens in request and response, 3rd generation models | 8192 |
Total number of tokens in request and response, synchronous mode of 4th generation models | 8192 |
Total number of tokens in request and response, asynchronous mode of 4th generation models | 32,000 |
Total number of tokens in request and response, YandexGPT Pro 32k | 32,000 |
Number of free requests per hour for users without a billing account. Available only in the management console | 10 |
Assistants | |
Maximum number of assistants | 1,000 |
Maximum number of threads | 1,000 |
Maximum number of users | 10,000 |
Maximum number of files to upload | 1,000 |
Maximum file size | 128 MB |
Maximum number of files per search index | 100 |
Maximum number of messages per thread | 10,000 |
Maximum number of search indexes | 1,000 |
Maximum number of indexing operations to run | 10 |
Image generation | |
Maximum prompt length | 500 characters |
Number of free requests per minute for users without a billing account. Available only in the management console | 2 |
Number of free requests per day for users without a billing account. Available only in the management console | 10 |