Nodes in a Managed Service for Kubernetes group not scaling down
Issue description
Nodes in your Managed Service for Kubernetes cluster group will not scale down.
Solution
Managed Service for Kubernetes uses Cluster Autoscaler for autoscaling of node groups. Here is how it works: you specify the minimum and maximum size of the node group, and the Kubernetes cluster regularly checks the state of pods and nodes.
If a workload on nodes is low and all pods can be assigned to fewer nodes per group, the number of nodes in the group will gradually decrease to the specified minimum.
Cluster Autoscaler periodically checks the load on the nodes and, if the pods can be safely rescheduled to other nodes without overloading them, it drains and shuts down the node.
To enable node draining, check the following:
- The node load is below 50%. To check the load level, you can use the
yc managed-kubernetes cluster list-nodes $CLUSTER_IDcommand, where$CLUSTER_IDis the Managed Service for Kubernetes cluster ID. - The pods on this node do not have local storage.
- There are no
affinity,antiaffinity,nodeselector, ortolopogyspreadrules preventing pod relocation. - The pods are managed by a controller, e.g., Deployment or StatefulSet.
- PodDisruptionBudget
will remain within its limit after the node deletion.
You can manually find the node in question and check its pods, including those from the kube-system namespace. Delete them manually, if required.
You can also set up the descheduler
We recommend that you enable master logging in your log group:
yc k8s cluster update <cluster_id> --master-logging enabled=true,log-group-id=<log_group_id>,cluster-autoscaler-enabled=true,kube-apiserver-enabled=true
The logs will help you identify the cause of the failed downscale.
If the issue persists
If the above actions did not help, create a support ticket
- Managed Service for Kubernetes cluster ID.
- Approximate date and time of Cluster Autoscaler errors.
- YAML specification of the pod controller, such as Deployment or StatefulSet.