Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Managed Service for Kubernetes
  • Comparison with other Yandex Cloud services
  • Getting started
    • Resource relationships
    • Release channels and updates
    • Encryption
    • Networking in Managed Service for Kubernetes
    • Network settings and cluster policies
    • Automatic scaling
    • Audit policy
    • External cluster nodes
    • Quotas and limits
    • Managed Service for Kubernetes usage recommendations
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Cluster autoscaling
  • Horizontal pod autoscaling
  • Vertical pod autoscaling
  • Use cases
  1. Concepts
  2. Automatic scaling

Automatic scaling

Written by
Yandex Cloud
Updated at May 5, 2025
  • Cluster autoscaling
  • Horizontal pod autoscaling
  • Vertical pod autoscaling
  • Use cases

Automatic scaling is a way to modify the size of a node group, the number of pods, or the amount of resources allocated to each pod based on resource requests for pods running on the group's nodes. Autoscaling is available as of Kubernetes version 1.15.

In a Managed Service for Kubernetes cluster, three types of automatic scaling are available:

  • Cluster autoscaling (Cluster Autoscaler). Managed Service for Kubernetes monitors the load on the nodes and updates the number of nodes within specified limits as required.
  • Horizontal pod scaling (Horizontal Pod Autoscaler). Kubernetes dynamically changes the number of pods running on each node in the group.
  • Vertical pod scaling (Vertical Pod Autoscaler). When load increases, Kubernetes allocates additional resources to each pod within established limits.

You can use several types of automatic scaling in the same cluster. However, using Horizontal Pod Autoscaler and Vertical Pod Autoscaler together is not recommended.

Cluster autoscalingCluster autoscaling

Cluster Autoscaler automatically modifies the number of nodes in a group depending on the load.

Warning

You can place autoscaling group nodes only in one availability zone.

When creating a node group, select an automatic scaling type and set the minimum, maximum, and initial number of nodes in the group. Kubernetes will periodically check the pod status and node load on the nodes, adjusting the group size as required:

  • If pods cannot be assigned due to a lack of vCPUs or RAM on the existing nodes, the number of nodes in the group will gradually increase to the specified maximum size.
  • If the load on the nodes is insufficient, and all pods can be assigned with fewer nodes per group, the number of nodes per group will gradually decrease to the specified minimum size. If a node's pods cannot be evicted within the specified period of time (7 minutes), the node is forced to stop. The waiting time cannot be changed.

Note

When calculating the current limits and quotas, Managed Service for Kubernetes uses the specified maximum node group size as its actual size, regardless of the current group size.

Cluster Autoscaler activation is only available when creating a node group. Cluster Autoscaler is managed on the Managed Service for Kubernetes side.

For more information, see the Kubernetes documentation:

  • Cluster Autoscaler description
  • Default parameters

See also Questions and answers about node group autoscaling in Managed Service for Kubernetes.

Horizontal pod autoscalingHorizontal pod autoscaling

When using horizontal pod scaling, Kubernetes changes the number of pods depending on vCPU load.

When creating a Horizontal Pod Autoscaler, specify the following using parameters:

  • Desired average percentage vCPU load for each pod.
  • Minimum and maximum number of pod replicas.

Horizontal pod autoscaling is available for the following controllers:

  • Deployment
  • StatefulSet
  • ReplicaSet

You can learn more about Horizontal Pod Autoscaler in the Kubernetes documentation.

Vertical pod autoscalingVertical pod autoscaling

Kubernetes uses the limits parameters to restrict resources allocated for each application. A pod exceeding the vCPU limit will trigger CPU throttling. A pod that has exceeded the RAM limit will be stopped.

If required, Vertical Pod Autoscaler allocates additional vCPU and RAM resources to pods.

When creating a Vertical Pod Autoscaler, set the autoscaling option in the specification:

  • updateMode: "Auto" for Vertical Pod Autoscaler to manage pod resources automatically.
  • updateMode: "Off" for Vertical Pod Autoscaler to provide recommendations on managing pod resources without modifying them.

You can learn more about Vertical Pod Autoscaler in the Kubernetes documentation.

Use casesUse cases

  • Horizontal application scaling in a cluster
  • Vertical application scaling in a cluster
  • Deploying and load testing a gRPC service with scaling
  • Creating an ACME resolver webhook for responses to DNS01 checks

Was the article helpful?

Previous
Network settings and cluster policies
Next
Audit policy
Yandex project
© 2025 Yandex.Cloud LLC