Questions and answers about Managed Service for Kubernetes
General questions
What services are available in Managed Service for Kubernetes clusters by default?
The following services are available by default:
- Metrics Server
for data aggregation on resource usage in a Kubernetes cluster. - Kubernetes plugin for CoreDNS
for name resolution in a cluster. - DaemonSet
supporting CSI plugins to work with persistent volumes (PersistentVolume
).
Which version of the Kubernetes CLI (kubectl) must be installed for comprehensive work with a cluster?
We recommend using the latest official version of kubectl
Can Yandex Cloud restore the health of the cluster if I configure it incorrectly?
The master is managed by Yandex Cloud, that's why you can't damage it. If you have issues with Kubernetes cluster components, contact technical support
Who will be monitoring the health of the cluster?
Yandex Cloud. A cluster is monitored for corrupted file system, kernel deadlock, internet connection loss and Kubernetes component issues. We're also developing a self-healing mechanism for faulty components.
How quickly does Yandex Cloud address vulnerabilities discovered in the security system? What do I do if an attacker has taken advantage of a vulnerability and my data is damaged?
Yandex Cloud services, images and master configuration initially undergo various security tests and checks for standard compliance.
Users can choose frequency of updates depending on their tasks and cluster configuration. It is important to consider attack targets and vulnerabilities in applications deployed in a Kubernetes cluster. Application security can be affected by such factors as network security policies between applications, vulnerabilities inside Docker containers, and incorrect launch mode of containers in a cluster.
Can I connect to a cluster node via OS Login, similar to a Yandex Cloud VM?
Yes, you can. To do this, follow the guide.
Data storage
What are the features of disk storage when a database (for example, MySQL® or PostgreSQL) is located in a Kubernetes cluster?
For a database located in a Kubernetes cluster, use StatefulSet
How do I connect to managed Yandex Cloud databases?
To connect to a Yandex Cloud managed database located in the same network, specify its hostname and FQDN.
To connect a database certificate to a pod, use secret
or configmap
objects.
What's the right way to add a persistent volume to a container?
You can select connection mode for Compute Cloud disks depending on your needs:
- If you want Kubernetes to automatically provision a
PersistentVolume
object and configure a new disk, create a pod with a dynamically provisioned volume. - To use existing Compute Cloud volumes, create a pod with a statically provisioned pod.
For more information, see Working with persistent volumes.
What types of volumes does Managed Service for Kubernetes support?
Managed Service for Kubernetes supports temporary
volumes and persistent
volumes. For more information, see Volumes.
Automatic scaling
Why are there N nodes in my cluster now, but the cluster is not scaling down?
Autoscaling does not stop nodes with pods that cannot be evicted. The scaling barriers include:
- Pods whose eviction is limited with PodDisruptionBudget.
- Pods in the
kube-system
namespace:- That were not created under the DaemonSet
controller. - That do not have
PodDisruptionBudget
or whose eviction is limited byPodDisruptionBudget
.
- That were not created under the DaemonSet
- Pods that were not created under a replication controller (ReplicaSet
, Deployment , or StatefulSet ). - Pods with
local storage
. - Pods that cannot be evicted anywhere due to limitations. For example, due to lack of resources or lack of nodes matching the affinity or anti-affinity
selectors. - Pods with an annotation that disables eviction:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
.
Note
Kube-system
pods, pods with local-storage
, and pods without a replication controller can be evicted. To do this, set the "safe-to-evict": "true"
annotation:
kubectl annotate pod <pod_name> cluster-autoscaler.kubernetes.io/safe-to-evict=true
Other possible causes include:
-
The node group has already reached its minimum size.
-
The node is idle for less than 10 minutes.
-
During the last 10 minutes, the node group has been scaled up.
-
During the last 3 minutes, there was an unsuccessful attempt to scale down the node group.
-
There was an unsuccessful attempt to stop a certain node. In this case, the next attempt occurs in 5 minutes.
-
The node has an annotation that prohibits stopping it on scale-down:
"cluster-autoscaler.kubernetes.io/scale-down-disabled": "true"
. You can add or remove an annotation usingkubectl
.Check for annotation on the node:
kubectl describe node <node_name> | grep scale-down-disabled
Result:
Annotations: cluster-autoscaler.kubernetes.io/scale-down-disabled: true
Set the annotation:
kubectl annotate node <node_name> cluster-autoscaler.kubernetes.io/scale-down-disabled=true
Remove the annotation by running the
kubectl
command with-
:kubectl annotate node <node_name> cluster-autoscaler.kubernetes.io/scale-down-disabled-
Why does the node group fail to scale down after the pod deletion?
If the node is underloaded, it is removed in 10 minutes.
Why isn't autoscaling performed even when the number of nodes gets less than the minimum or greater than the maximum?
Autoscaling will not violate the preset limits, but Managed Service for Kubernetes does not explicitly control the limits. Upscaling will only trigger if there are pods in the unschedulable
status.
Why do Terminated pods remain in my cluster?
This happens because the Pod garbage collector (PodGC)
To get answers to other questions about autoscaling, see the Kubernetes documentation
Configuring and updating
What do I do if some data gets lost after I update the Kubernetes version?
Your data will not get lost: prior to updating the Kubernetes version Managed Service for Kubernetes creates a data backup. You can manually configure cluster backup in Yandex Object Storage. We also recommend backing up your database using the application tools.
Can I configure a backup for a Kubernetes cluster?
Data in Managed Service for Kubernetes clusters is securely stored and replicated within the Yandex Cloud infrastructure. However, you can back up data from Managed Service for Kubernetes cluster node groups at any time and store them in Object Storage or other types of storage.
For more information, see Managed Service for Kubernetes cluster backups in Object Storage.
Will resources be idle while the Kubernetes version is updating?
When a master is being updated, Control Plane resources will be idle. For this reason, such operations as Managed Service for Kubernetes node group create or delete will be unavailable. User load on the application will continue to be processed.
If the max_expansion
value is greater than zero, new nodes are created when Managed Service for Kubernetes node groups are updated. All the load is transferred to new nodes, and the old node groups are deleted. In this case, idle time will be equal to the pod restart time when it is transferred to a new Managed Service for Kubernetes node group.
Can I update a Managed Service for Kubernetes cluster in one step?
It depends on the source and target version you want to migrate your Managed Service for Kubernetes cluster from/to. You can only update your Managed Service for Kubernetes cluster in a single step to the next minor version from the current one. Updating to newer versions is done in steps, e.g., 1.19 → 1.20 → 1.21. For more information, see Updating a cluster.
If you want to skip interim versions, create a Managed Service for Kubernetes cluster of the appropriate version and transfer the load from the old cluster to the new one.
Is the Container Network Interface plugin updated along with a Managed Service for Kubernetes cluster?
Yes, it is. If you are using Calico and Cilium controllers, they are updated along with your Managed Service for Kubernetes cluster. To update your Managed Service for Kubernetes cluster, do one of the following:
- Create a Managed Service for Kubernetes cluster of the appropriate version and transfer the load from the old cluster to the new one.
- Update your Managed Service for Kubernetes cluster manually.
To make sure the current Managed Service for Kubernetes cluster version is updated on time, set up automatic updates.
Can I send you a YAML configuration file so that you apply it to my cluster?
No. You can use a kubeconfig file to apply a YAML cluster configuration file on your own.
Can you install Web UI Dashboard, Rook, and other tools?
No. You can install all the necessary tools on your own.
What do I do if I cannot attach volumes after updating Kubernetes?
If the following error occurs after you update Kubernetes:
AttachVolume.Attach failed for volume "pvc":
Attach timeout for volume yadp-k8s-volumes/pvc
Update the s3-CSI driver
Resources
What resources are needed to maintain a Kubernetes cluster with a group of, say, three nodes?
Each node needs resources to run the components in charge of running the node as part of the Kubernetes cluster. For more information, see Dynamic resource allocation.
Can I change resources for each node in a Kubernetes cluster?
You can change resources only for a node group. You can create groups with different configurations in a Kubernetes cluster and place them in different availability zones. For more information, see Updating a Managed Service for Kubernetes node group.
Who monitors the scaling of a Kubernetes cluster?
In Managed Service for Kubernetes, you can enable automatic cluster scaling.
Logs
How can I monitor the Managed Service for Kubernetes cluster state?
Get the cluster statistics. You can view the description of the available cluster metrics in the reference.
Can I get logs of my operations with services?
Yes, you can request log records about your resources from Yandex Cloud services. For more information, see Data requests.
Can I save logs myself?
For log collection and storage, use Fluent Bit.
Can I use Yandex Cloud Logging for viewing logs?
Yes, you can. To do this, set up sending logs to Cloud Logging when creating or updating a Managed Service for Kubernetes cluster. The setting is only available in the CLI, Terraform, and API.
Is Horizontal Pod Autoscaler supported?
Yes, Managed Service for Kubernetes supports horizontal pod autoscaling.
Troubleshooting
This section describes typical problems you may encounter while using Managed Service for Kubernetes and gives troubleshooting recommendations.
Error creating a cluster in a different folder's cloud network
Error message:
Permission denied
The error occurs when the resource service account has no required roles in the folder whose cloud network is selected when creating a cluster.
To create a Managed Service for Kubernetes cluster in a cloud network of another folder, assign the resource service account the following roles in this folder:
To use a public IP address, also assign the vpc.publicAdmin role.
A namespace has been deleted but its status is Terminating and its deletion is not completed
This happens when a namespace has stuck resources that cannot be deleted by the namespace controller.
To fix the issue, delete the stuck resources manually.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
-
Get a list of resources that remain within the namespace:
kubectl api-resources --verbs=list --namespaced --output=name \ | xargs --max-args=1 kubectl get --show-kind \ --ignore-not-found --namespace=<namespace>
-
Delete the resources found:
kubectl delete <resource_type> <resource_name> --namespace=<namespace>
That being done, if the namespace is still in the Terminating
status and cannot be deleted, delete it forcibly using finalizer
:
-
Enable Kubernetes API proxy to your local computer:
kubectl proxy
-
Delete the namespace:
kubectl get namespace <namespace> --output=json \ | jq '.spec = {"finalizers":[]}' > temp.json && \ curl --insecure --header "Content-Type: application/json" \ --request PUT --data-binary @temp.json \ 127.0.0.1:8001/api/v1/namespaces/<namespace>/finalize
We do not recommend deleting the namespace with the Terminating
status using finalizer
right away, as this may cause the stuck resources to remain in your Managed Service for Kubernetes cluster.
I am using Yandex Network Load Balancer alongside an Ingress controller. Why are some of my cluster's nodes UNHEALTHY?
This is normal behavior for a load balancer with External Traffic Policy: Local
enabled. Only the Managed Service for Kubernetes nodes whose pods are ready to accept user traffic get the HEALTHY
status. The rest of the nodes are labeled as UNHEALTHY
.
To find out the policy type of a load balancer created using a LoadBalancer
type service, run this command:
kubectl describe svc <LoadBalancer_type_service_name> \
| grep 'External Traffic Policy'
For more information, see Parameters of a LoadBalancer service.
Why is a created PersistentVolumeClaim still pending?
This is normal for a PersistentVolumeClaim (PVC). The created PVC remains in the Pending status until you create a pod that must use it.
To change the PVC status to Running:
-
View details of the PVC:
kubectl describe pvc <PVC_name> \ --namespace=<namespace_PVC_resides_in>
A message saying
waiting for first consumer to be created before binding
means that the PVC is waiting for a pod to be created. -
Create a pod for this PVC.
Why does my Managed Service for Kubernetes cluster fail to run after I change its node configuration?
Make sure the new configuration of Managed Service for Kubernetes nodes is within the quota:
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To run diagnostics for your Managed Service for Kubernetes cluster nodes:
-
Check the health of Managed Service for Kubernetes nodes:
yc managed-kubernetes cluster list-nodes <cluster_ID>
A message saying that the allowed amount of Managed Service for Kubernetes cluster resources has been exceeded is displayed in the first column of the command output. Example:
+--------------------------------+-----------------+------------------+-------------+--------------+ | CLOUD INSTANCE | KUBERNETES NODE | RESOURCES | DISK | STATUS | +--------------------------------+-----------------+------------------+-------------+--------------+ | fhmil14sdienhr5uh89no | | 2 100% core(s), | 64.0 GB hdd | PROVISIONING | | CREATING_INSTANCE | | 4.0 GB of memory | | | | [RESOURCE_EXHAUSTED] The limit | | | | | | on total size of network-hdd | | | | | | disks has exceeded., | | | | | | [RESOURCE_EXHAUSTED] The limit | | | | | | on total size of network-hdd | | | | | | disks has exceeded. | | | | | +--------------------------------+-----------------+------------------+-------------+--------------+
To run your Managed Service for Kubernetes cluster, increase the quotas.
An error occurs when renewing an Ingress controller certificate
Error message:
ERROR controller-runtime.manager.controller.ingressgroup Reconciler error
{"name": "some-prod", "namespace": , "error": "rpc error: code = InvalidArgument
desc = Validation error:\nlistener_specs[1].tls.sni_handlers[2].handler.certificate_ids:
Number of elements must be less than or equal to 1"}
The error occurs if different certificates are specified for the same Ingress controller listener.
Solution: Edit and apply the Ingress controller specifications making sure that only one certificate is specified in each listener's description.
Why is DNS name resolution not working in my cluster?
There may be no name resolution for internal and external DNS queries in a Managed Service for Kubernetes cluster for several reasons. To fix the issue:
- Check the version of your Managed Service for Kubernetes cluster and node groups.
- Make sure that CoreDNS is up and running.
- Make sure the Managed Service for Kubernetes cluster has enough CPU resources available.
- Set up autoscaling.
- Set up local DNS caching.
Check the version of your cluster and node groups
-
Get a list of current Kubernetes versions:
yc managed-kubernetes list-versions
-
Find out the Managed Service for Kubernetes cluster version:
yc managed-kubernetes cluster get <cluster_name_or_ID> | grep version:
You can get the Managed Service for Kubernetes cluster ID and name with a list of clusters in the folder.
-
Find out the Managed Service for Kubernetes node group version:
yc managed-kubernetes node-group get <node_group_name_or_ID> | grep version:
You can get the ID and name of the Managed Service for Kubernetes node group with a list of node groups in your cluster.
-
If the versions of your Managed Service for Kubernetes cluster and node groups are not on the list of current Kubernetes versions, upgrade them.
Make sure that CoreDNS is up and running
Get a list of CoreDNS pods and their statuses:
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide
Make sure all the pods have the Running
status.
Make sure the cluster has enough CPU resources available
- Go to the folder page
and select Managed Service for Kubernetes. - Click the name of the Managed Service for Kubernetes cluster you need and select the Nodes manager tab.
- Go to the Nodes tab and click the name of any Managed Service for Kubernetes node.
- Go to the Monitoring tab.
- Make sure that, in the CPU, [cores] chart, the
used
CPU values have not reached thetotal
available CPU values. Check this for each Managed Service for Kubernetes cluster node.
Set up autoscaling
Set up automatic DNS scaling by Managed Service for Kubernetes cluster size.
Set up local DNS caching
Set up NodeLocal DNS Cache. To make sure that the settings are optimal, install NodeLocal DNS Cache from Yandex Cloud Marketplace.
When creating a node group via the CLI, a parameter conflict occurs. How do I fix that?
Check whether the --location
, --network-interface
, and --public-ip
parameters are specified in the same command. If you provide these parameters together, the following errors occur:
-
For the
--location
and--public-ip
or--location
and--network-interface
pairs:ERROR: rpc error: code = InvalidArgument desc = Validation error: allocation_policy.locations[0].subnet_id: can't use "allocation_policy.locations[0].subnet_id" together with "node_template.network_interface_specs"
-
For the
--network-interface
and--public-ip
pair:ERROR: flag --public-ip cannot be used together with --network-interface. Use '--network-interface' option 'nat' to get public address
Make sure you only provide one of the three parameters in a command. It is enough to specify the location of a Managed Service for Kubernetes node group either in --location
or --network-interface
.
To assign public IP addresses to Managed Service for Kubernetes nodes, do one of the following:
- Specify
--network-interface ipv4-address=nat
or--network-interface ipv6-address=nat
. - Enable access to Managed Service for Kubernetes nodes from the internet after creating a node group.
Error connecting to a cluster using kubectl
Error message:
ERROR: cluster has empty endpoint
The error occurs if you try to connect to a cluster with no public IP address and get kubectl
credentials for a public IP address using this command:
yc managed-kubernetes cluster \
get-credentials <cluster_name_or_ID> \
--external
To connect to the cluster's private IP address from a VM located in the same network, get kubectl
credentials using this command:
yc managed-kubernetes cluster \
get-credentials <cluster_name_or_ID> \
--internal
If you need to connect to a cluster from the internet, recreate the cluster and assign it a public IP address.
Errors occur when connecting to a node over SSH
Error messages:
Permission denied (publickey,password)
Too many authentication failures
Errors occur when connecting to a Managed Service for Kubernetes node in the following cases:
-
No public SSH key is added to the Managed Service for Kubernetes node group metadata.
Solution: Update the Managed Service for Kubernetes node group keys.
-
An invalid public SSH key is added to the Managed Service for Kubernetes node group metadata.
Solution: Change the format of the public key file to the appropriate one and update the Managed Service for Kubernetes node group keys.
-
No private SSH key is added to an authentication agent (ssh-agent).
Solution: Add a private key by running the following command:
ssh-add <path_to_private_key_file>
.
How do I grant internet access to Managed Service for Kubernetes cluster nodes?
If Managed Service for Kubernetes cluster nodes have no access to the internet, the following error occurs when trying to connect to the internet:
Failed to pull image "cr.yandex/***": rpc error: code = Unknown desc = Error response from daemon: Gethttps://cr.yandex/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
There are several ways to grant internet access to Managed Service for Kubernetes cluster nodes:
- Create and configure a NAT gateway or NAT instance. As a result, through static routing, traffic will be routed via the gateway or a separate VM instance with NAT features.
- Assign a public IP address to a Managed Service for Kubernetes node group.
Note
If you assigned public IP addresses to the cluster nodes and then configured the NAT gateway or NAT instance, internet access via the public IP addresses will be disabled. For more information, see the Yandex Virtual Private Cloud documentation.
Why cannot I choose Docker as the container runtime environment?
There is no support for Docker as a container runtime environment in clusters with Kubernetes version 1.24 or higher. Only containerd