Autoscaling
Autoscaling is a way of changing the size of a node group, number of pods, or the amount of resources allocated to each pod based on resource requests for pods running on this group's nodes.
In a Managed Service for Kubernetes cluster, the following autoscaling types are available:
- Cluster autoscaling (Cluster Autoscaler). Managed Service for Kubernetes monitors workloads on the nodes and updates the number of nodes within specified limits as required.
- Master autoscaling (Master Autoscaler). Managed Service for Kubernetes monitors workloads on the master and updates its configuration as required.
- Horizontal pod scaling (Horizontal Pod Autoscaler). Kubernetes dynamically changes the number of pods running on each node in the group.
- Vertical pod scaling (Vertical Pod Autoscaler). When workloads increase, Kubernetes allocates additional resources to each pod within the set limits.
Warning
Starting June 18, 2026, master autoscaling is enabled on all clusters in the RAPID release channel where the master is deployed in a highly available configuration.
You can employ several types of autoscaling in the same cluster. However, using Horizontal Pod Autoscaler and Vertical Pod Autoscaler together is not recommended.
Cluster autoscaling
Cluster Autoscaler automatically modifies the number of nodes in a group depending on your workloads.
Warning
You can only place an autoscaling node group in one availability zone.
When creating a node group, select an autoscaling type and set the minimum, maximum, and initial number of nodes in the group. Kubernetes will periodically check the pod status and workloads on the nodes, adjusting the group size as required:
- If pods cannot be assigned due to a lack of vCPUs or RAM on the existing nodes, the number of nodes in the group will gradually increase to the specified maximum.
- If a workload on nodes is low, and all pods can be assigned to fewer nodes in a group, the number of nodes in the group will gradually decrease to the specified minimum. If pods on the node cannot be evicted within the specified period of time (7 minutes), the node is forced to stop. The timeout cannot be changed.
Note
When calculating the current limits and quotas
Cluster Autoscaler activation is only available when creating a node group. Cluster Autoscaler is managed on the Managed Service for Kubernetes side.
For more information, see these Kubernetes guides:
See also Questions and answers about node group autoscaling in Managed Service for Kubernetes.
Master autoscaling
Warning
Starting June 18, 2026, master autoscaling is enabled on all clusters in the RAPID release channel where the master is deployed in a highly available configuration.
Warning
Starting June 18, 2026, master pricing will change: you will be charged for the number of vCPUs and the amount of RAM. To calculate the required master resources for your cluster, use the Recommended master configurations table.
Master Autoscaler automatically adjusts the master configuration to match the current workload. This approach keeps the cluster stable without manual configuration tuning.
To scale, Master Autoscaler periodically collects master utilization metrics: vCPU count and RAM volume. Based on the metrics, one of the following decisions can be made:
- Increase resources if the master is close to overload.
- Decrease resources if the master is consistently underutilized.
- Keep resources unchanged if utilization is within normal range.
To avoid reacting to short-term load spikes, decisions are based on aggregated metrics collected over several minutes. Scaling is triggered only when a critical value persists for a certain period of time.
Master Autoscaler does not reduce resources below the master configuration selected when creating or updating the cluster. That configuration is used as the lower scaling boundary.
After a decision is made, the autoscaler selects the nearest suitable configuration so that utilization remains within normal range after the change.
Even when using Master Autoscaler, select a master configuration that matches the actual cluster workload. Use the recommended configurations as a reference: they depend on the number of nodes, the maximum number of pods, and the CNI in use. Too much vCPU and RAM prevents releasing unused resources. Too little may cause frequent scaling.
Master autoscaling operations are displayed in the cluster operations section. While scaling is in progress, other cluster operations cannot be started. You must wait for it to complete.
Horizontal pod autoscaling
When using horizontal pod scaling, Kubernetes changes the number of pods depending on vCPU workload.
When creating a Horizontal Pod Autoscaler, specify the following parameters:
- vCPU load average percentage for each pod.
- Minimum and maximum number of pod replicas.
Horizontal pod autoscaling is available for the following controllers:
For more on Horizontal Pod Autoscaler, see this Kubernetes guide
Vertical pod autoscaling
Kubernetes uses the limits parameters to restrict resources allocated for each application. A pod exceeding the vCPU limit will trigger CPU throttling. A pod exceeding the RAM limit will be stopped.
If required, Vertical Pod Autoscaler allocates additional vCPU and RAM resources to pods.
When creating a Vertical Pod Autoscaler, set the autoscaling mode in the specification:
updateMode: "Off"for Vertical Pod Autoscaler to provide recommendations on managing pod resources without modifying them.updateMode: "Initial"for Vertical Pod Autoscaler to apply recommendations only when creating pods.updateMode: "Recreate"for Vertical Pod Autoscaler to recreate pods with updated resource values in case of a serious discrepancy between the current requests and recommendations.updateMode: "InPlaceOrRecreate"for Vertical Pod Autoscaler to attempt updating requests and resource limits first, without restarting the pod. If such an update is not possible, the pod will be recreated as in theRecreatemode. For more information, see Resize CPU and Memory Resources assigned to Containers .
For more on Vertical Pod Autoscaler, see this Kubernetes guide