Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Compute Cloud
  • Yandex Container Solution
    • Resource relationships
    • Graphics processing units (GPUs)
    • Images
      • Overview
      • Access
      • YAML specification
      • Instance template
      • Variables in an instance template
      • Scaling types
      • Instance health checks and automatic recovery
      • Integrating with network and L7 load balancers
      • Handling a stateful workload
      • Stopping and pausing an instance group
      • Sequentially restarting and recreating instances in a group
      • Statuses
    • Dedicated host
    • Encryption
    • Backups
    • Quotas and limits
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Manually scaled groups
  • Use cases
  • Automatically scaled groups
  • Type of automatic scaling
  • General settings
  • Metrics for autoscaling
  • Calculating the average metric value
  • Use cases
  1. Concepts
  2. Instance groups
  3. Scaling types

Scaling types

Written by
Yandex Cloud
Updated at April 18, 2025
  • Manually scaled groups
    • Use cases
  • Automatically scaled groups
    • Type of automatic scaling
    • General settings
    • Metrics for autoscaling
    • Calculating the average metric value
    • Use cases

Instance group scaling type should be selected for each new instance group. This setting decides how the number of instances in the group will be controlled: automatically or manually.

Note

If you pause an instance group's processes (the PAUSED status), the group will not be scaled.

Manually scaled groupsManually scaled groups

You can create fixed-size instance groups and manage their size manually based on your current computing needs.

Use casesUse cases

  • Scheduled instance group scaling
  • Updating an instance group under load

Automatically scaled groupsAutomatically scaled groups

When creating an automatically scaled instance group, you specify the target metric value, while the service continuously re-adjusts the number of instances:

  • If the average metric value exceeds the target, Instance Groups will create new instances in the group.
  • If the average value decreases below the target value with a smaller group, Instance Groups will delete unnecessary instances.

This is done to ensure that the average metric value within the same availability zone or the entire group (depending on the automatic scaling type) does not differ much from the target value.

For example, let's assume there are 4 instances in an availability zone with an average metric value of 70 and target value of 80. Instance Groups will not reduce the group size because if an instance is deleted, the average value will exceed the target value: 4 × 70 / 3 = 93.3. When the average value drops to 60, Instance Groups will delete one instance because the average value does exceed the target value: 4 × 60 / 3 = 80.

If multiple metrics are specified in the settings, the largest estimated instance group size is used.

For automatically scaled groups, you need to specify common scaling settings and metric settings.

Type of automatic scalingType of automatic scaling

Instance Groups can adjust the number of instances separately in each availability zone specified in the group settings or in the entire instance group:

  • With zonal scaling, Instance Groups will calculate the average scaling metric value and required number of instances separately for each availability zone. This type of automatic scaling is used by default.
  • With regional scaling, the metric value and the number of instances are calculated for the entire group. To change the group's autoscaling type to regional, provide the auto_scale scaling policy with the auto_scale_type: REGIONAL key.

General settingsGeneral settings

To reduce adjustment sensitivity, with Instance Groups, you can configure:

  • Stabilization period: After the number of VMs increases, the group size will not decrease until the stabilization period ends, even if the average metric value has dropped low enough.

  • Warm-up period: Period during which the VM, upon its start, will not use:

    • CPU utilization.
    • Monitoring metric values that are applied according to the UTILIZATION rule.

    Average metric values for the group will be used instead.

  • Utilization measurement period: Metric value will be calculated as an average of all measurements taken during the specified period.

    For example, the CPU load may rise to 100% in one second and then drop to 10%. To ignore such surges, Instance Groups will use average values for the specified period, such as one minute.

You can also set limits on the number of instances per group:

  • Maximum group size: Instance Groups will not create more instances if a group already contains this many.
  • Minimum size in a single availability zone: Instance Groups will not delete instances from an availability zone if there are only this many instances in the zone.

Metrics for autoscalingMetrics for autoscaling

You can use the following metrics for autoscaling:

  • CPU utilization.
  • Any metrics from Yandex Monitoring.

CPU utilizationCPU utilization

Instance Groups can control the group size to maintain average CPU utilization within the target level. The average CPU utilization is calculated for an instance separately from each availability zone or from the entire group (for the zonal or regional scaling type, respectively).

Here is what Instance Groups will do outside the stabilization period:

  1. Calculate the average CPU utilization during the specified measurement period for each instance, except those that are still warming up. The load is measured several times per minute on every instance.

  2. Use the obtained values to calculate the average load for each availability zone or across the entire group.

    For example, let's assume there is a group of four instances located in one availability zone. One of the instances starts, while the others are under 90%, 75%, and 85% workload on average during the measurement period. Average zone load: (90 + 75 + 85) / 3 = 83.4%.

  3. Obtain the total load, i.e., multiply the resulting average load by the total number of instances.

    In our example, it is 83.4 × 4 = 333.6%

  4. Divide the total load by the target load level to obtain the number of instances required (the result is rounded up).

    Say, for example, the target level is 75%. This means that you need 333.6 / 75 = 4.48 ~ 5 instances. Based on the approximate results, you need to create another instance.

Once the number of instances is calculated and changed (if required), Instance Groups will start calculating the average load again.

Monitoring metricsMonitoring metrics

You can use up to three Monitoring metrics for automatic scaling in Instance Groups. To read the metrics, the service account linked to the instance group needs the monitoring.viewer role or higher.

When using monitoring metrics, specify the following in Instance Groups:

  • Metric name you specified in Monitoring.
  • Labels you specified in Monitoring:
  • (Optional) folder_id: Folder ID. By default, it is the ID of the folder the group belongs to.
  • (Optional) service: Service ID. The default value is custom. You can use a label to specify service metrics, e.g., service with the compute value for Compute Cloud.

You will also need specify other labels for this metric:

  • Metric type that affects how Instance Groups will calculate the average metric value:
    • GAUGE: Used for metrics displaying the metric value at a specific point in time, e.g., the number of requests per second to a server running on an instance. Instance Groups computes the average metric value for the specified averaging period.
    • COUNTER: Used for metrics exhibiting a monotonous growth over time, e.g., total number of requests to a server running on an instance. Instance Groups calculates the metric's average increment for the specified averaging period.
  • Metric rule type:
    • UTILIZATION: Metric displays resource consumption by a single instance.

      The number of instances per availability zone or the entire group (for zonal or regional scaling, respectively) based on the UTILIZATION metric is calculated in the same way as the number of instances based on CPU load.

      When delivered to Monitoring, the UTILIZATION metric must have the instance_id label.

    • WORKLOAD: Means the metric presents the total load on all instances in one availability zone or the entire group (for zonal or regional scaling type, respectively).

      To calculate the number of instances per availability zone or in the entire group by the WORKLOAD metric, the average metric value is divided by the target value, and the result is rounded up.

      For example, let's assume there are two instances in an availability zone. The metric shows the total number of requests per second (RPS) to all instances. If the target metric value is 200, then with an average value of 450, Instance Groups will increase the number of instances in the availability zone to three: 450 / 200 = 2.25 ~ 3 instances.

      The metric value is also calculated and used during the instance warm-up period specified in the general settings.

      If zonal scaling is applied to the group, when delivered in Monitoring, the WORKLOAD metric must have the zone_id label.

  • Target metric value based on which Instance Groups calculates the required number of instances. For UTILIZATION metrics, the target value is the desired resource consumption level for each instance; for WORKLOAD metrics, it is the maximum allowed workload for each instance.

Calculating the average metric valueCalculating the average metric value

The average metric value is calculated using an exponential moving average. This makes autoscaling sensitive to metric spikes while smoothing out peaks.

The normal average is calculated using the following formula:

∫abf(x)dxb−a,\begin{array}{c} \frac{\int_a^b f(x)dx}{b - a} \end{array}{} , b−a∫ab​f(x)dx​​,

where f(x)f(x)f(x) is the function of the metric in the [a,b][ a, b ][a,b] time range.

There may be spikes in metric values throughout the [a,b][ a, b ][a,b] time range. The normal average is calculated without regard to when the spike occurred — whether closer to the beginning or end of the time range. This can cause a VM group to scale excessively and increase the costs of resources.

To account for the time of a metric spike, an exponential moving average is used:

∫abf(x)w(x)dx∫abw(x)dx,\begin{array}{c} \frac{\int_a^b f(x)w(x)dx}{\int_a^b w(x)dx} \end{array}{} , ∫ab​w(x)dx∫ab​f(x)w(x)dx​​,

where w(x)w(x)w(x) is the w(x)=k−(x−a)w(x) = k^{-(x - a)}w(x)=k−(x−a), k∈(0,1)k \in (0, 1)k∈(0,1) weight function allowing you to assign larger weights to the f(x)f(x)f(x) function values towards the end of the bbb segment, i.e., closer to the current time.

The kkk factor depends on how long the metric is measured and is calculated using this formula:

k=1exp(10/t),\begin{array}{c} k=\frac{1}{exp(10/t)} \end{array}{} , k=exp(10/t)1​​,

where ttt is the time of measuring the metric in seconds, t=b−at = b - at=b−a.

Use casesUse cases

  • Running an autoscaling instance group
  • Autoscaling an instance group to process messages in Yandex Message Queue

See alsoSee also

  • Scaling policy
  • Creating a fixed-size instance group
  • Creating an autoscaling instance group

Was the article helpful?

Previous
Recovery policy
Next
Instance health checks and automatic recovery
© 2025 Direct Cursus Technology L.L.C.