Quotas and limits in Yandex Foundation Models
Written by
Updated at October 17, 2024
YandexGPT API has the following limits:
- Quotas are organizational restrictions that can be changed by technical support on request.
- Limits are technical limitations due to Yandex Cloud architectural features. The limits cannot be changed.
If you need more resources, contact support
Quotas
Type of limit | Value |
---|---|
Text vectorization | |
Number of text vectorization requests per second | 10 |
Text generation | |
Number of concurrent generations | 1 |
Number of requests per second, asynchronous mode (request) | 10 |
Number of requests per second, asynchronous mode (getting a response) | 50 |
Number of requests per hour, asynchronous mode (request) | 5000 |
Number of tokenization requests per second | 50 |
Text classification | |
Number of text classification requests per second | 1 |
Image generation | |
Number of generation requests per minute | 10 |
Number of generation requests per day | 500 |
Number of result requests per second | 50 |
Limits
Type of limit | Value |
---|---|
Period to store results of asynchronous requests on the server | 3 days |
Text vectorization | |
Number of input tokens | 2000 |
Output vector size | 256 |
Text generation | |
Number of tokens per response | 2000 |
Maximum number of tokens per response in the management console |
500 |
Total number of tokens | 8192 |
Number of free requests per hour for users without a billing account. Available only in the management console | 10 |
Image generation | |
Maximum prompt length | 500 characters |
Number of free requests per minute for users without a billing account. Available only in the management console | 2 |
Number of free requests per day for users without a billing account. Available only in the management console | 10 |