Troubleshooting in Managed Service for Kubernetes

Written by

Updated at June 3, 2025

This section describes typical problems you may encounter while using Managed Service for Kubernetes and gives troubleshooting recommendations.

Error creating a cluster in a different folder's cloud network

Error message:

Permission denied

The error occurs when the resource service account has no required roles in the folder whose cloud network is selected when creating a cluster.

To create a Managed Service for Kubernetes cluster in a cloud network of another folder, assign the resource service account the following roles in this folder:

To use a public IP address, also assign the vpc.publicAdmin role.

A namespace has been deleted but its status is Terminating and its deletion is not completed

This happens when a namespace has stuck resources that cannot be deleted by the namespace controller.

To fix the issue, delete the stuck resources manually.

CLI

If you do not have the Yandex Cloud CLI yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

Connect to the Managed Service for Kubernetes cluster.

Get a list of resources that remain within the namespace:

kubectl api-resources --verbs=list --namespaced --output=name \
  | xargs --max-args=1 kubectl get --show-kind \
  --ignore-not-found --namespace=<namespace>

Delete the resources found:

kubectl delete <resource_type> <resource_name> --namespace=<namespace>

That being done, if the namespace is still in the Terminating status and cannot be deleted, delete it forcibly using finalizer:

Enable Kubernetes API proxy to your local computer:
```
kubectl proxy
```

Delete the namespace:

kubectl get namespace <namespace> --output=json \
  | jq '.spec = {"finalizers":[]}' > temp.json && \
curl --insecure --header "Content-Type: application/json" \
  --request PUT --data-binary @temp.json \
  127.0.0.1:8001/api/v1/namespaces/<namespace>/finalize

We do not recommend deleting the namespace with the Terminating status using finalizer right away, as this may cause the stuck resources to remain in your Managed Service for Kubernetes cluster.

I am using Yandex Network Load Balancer alongside an Ingress controller. Why are some of my cluster's nodes UNHEALTHY?

This is normal behavior for a load balancer with External Traffic Policy: Local enabled. Only the Managed Service for Kubernetes nodes whose pods are ready to accept user traffic get the HEALTHY status. The rest of the nodes are labeled as UNHEALTHY.

To find out the policy type of a load balancer created using a LoadBalancer type service, run this command:

kubectl describe svc <LoadBalancer_type_service_name> \
| grep 'External Traffic Policy'

For more information, see Parameters of a LoadBalancer service.

Why is a created PersistentVolumeClaim still pending?

This is normal for a PersistentVolumeClaim (PVC). The created PVC remains in the Pending status until you create a pod that must use it.

To change the PVC status to Running:

View details of the PVC:
```
kubectl describe pvc <PVC_name> \
  --namespace=<namespace_PVC_resides_in>
```
A message saying waiting for first consumer to be created before binding means that the PVC is waiting for a pod to be created.
Create a pod for this PVC.

Why does my Managed Service for Kubernetes cluster fail to run after I change its node configuration?

Make sure the new configuration of Managed Service for Kubernetes nodes is within the quota:

CLI

If you do not have the Yandex Cloud CLI yet, install and initialize it.

To run diagnostics for your Managed Service for Kubernetes cluster nodes:

Connect to the Managed Service for Kubernetes cluster.

Check the health of Managed Service for Kubernetes nodes:

yc managed-kubernetes cluster list-nodes <cluster_ID>

A message saying that the allowed amount of Managed Service for Kubernetes cluster resources has been exceeded is displayed in the first column of the command output. For example:

+--------------------------------+-----------------+------------------+-------------+--------------+
|         CLOUD INSTANCE         | KUBERNETES NODE |     RESOURCES    |     DISK    |    STATUS    |
+--------------------------------+-----------------+------------------+-------------+--------------+
| fhmil14sdienhr5uh89no          |                 | 2 100% core(s),  | 64.0 GB hdd | PROVISIONING |
| CREATING_INSTANCE              |                 | 4.0 GB of memory |             |              |
| [RESOURCE_EXHAUSTED] The limit |                 |                  |             |              |
| on total size of network-hdd   |                 |                  |             |              |
| disks has exceeded.,           |                 |                  |             |              |
| [RESOURCE_EXHAUSTED] The limit |                 |                  |             |              |
| on total size of network-hdd   |                 |                  |             |              |
| disks has exceeded.            |                 |                  |             |              |
+--------------------------------+-----------------+------------------+-------------+--------------+

To run your Managed Service for Kubernetes cluster, increase the quotas.

An error occurs when renewing an Ingress controller certificate

Error message:

ERROR controller-runtime.manager.controller.ingressgroup Reconciler error
{"name": "some-prod", "namespace": , "error": "rpc error: code = InvalidArgument
desc = Validation error:\nlistener_specs[1].tls.sni_handlers[2].handler.certificate_ids:
Number of elements must be less than or equal to 1"}

The error occurs if different certificates are specified for the same Ingress controller listener.

Solution: Edit and apply the Ingress controller specifications making sure that only one certificate is specified in each listener's description.

Why is DNS name resolution not working in my cluster?

There may be no name resolution for internal and external DNS queries in a Managed Service for Kubernetes cluster for several reasons. To fix the issue:

Check the version of your Managed Service for Kubernetes cluster and node groups.
Make sure that CoreDNS is up and running.
Make sure the Managed Service for Kubernetes cluster has enough CPU resources available.
Set up autoscaling.
Set up local DNS caching.

Check the version of your cluster and node groups

Get a list of current Kubernetes versions:
```
yc managed-kubernetes list-versions
```
Find out the Managed Service for Kubernetes cluster version:
```
yc managed-kubernetes cluster get <cluster_name_or_ID> | grep version:
```
You can get the Managed Service for Kubernetes cluster ID and name with a list of clusters in the folder.
Find out the Managed Service for Kubernetes node group version:
```
yc managed-kubernetes node-group get <node_group_name_or_ID> | grep version:
```
You can get the ID and name of the Managed Service for Kubernetes node group with a list of node groups in your cluster.
If the versions of your Managed Service for Kubernetes cluster and node groups are not on the list of current Kubernetes versions, upgrade them.

Make sure that CoreDNS is up and running

Get a list of CoreDNS pods and their statuses:

kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

Make sure all the pods have the Running status.

Make sure the cluster has enough CPU resources available

Navigate to the folder dashboard and select Managed Service for Kubernetes.
Click the name of the Managed Service for Kubernetes cluster you need and select the Node manager tab.
Go to the Nodes tab and click the name of any Managed Service for Kubernetes node.
Go to the Monitoring tab.
Make sure that, in the CPU, [cores] chart, the used CPU values have not reached the total available CPU values. Check this for each Managed Service for Kubernetes cluster node.

Set up autoscaling

Set up automatic DNS scaling by Managed Service for Kubernetes cluster size.

Set up local DNS caching

Set up NodeLocal DNS Cache. To make sure that the settings are optimal, install NodeLocal DNS Cache from Yandex Cloud Marketplace.

When creating a node group via the CLI, a parameter conflict occurs. How do I fix that?

Check whether the --location, --network-interface, and --public-ip parameters are specified in the same command. If you provide these parameters together, the following errors occur:

For the --location and --public-ip or --location and --network-interface pairs:

ERROR: rpc error: code = InvalidArgument desc = Validation error:
allocation_policy.locations[0].subnet_id: can't use "allocation_policy.locations[0].subnet_id" together with "node_template.network_interface_specs"

For the --network-interface and --public-ip pair:

ERROR: flag --public-ip cannot be used together with --network-interface. Use '--network-interface' option 'nat' to get public address

Make sure you only provide one of the three parameters in a command. It is enough to specify the location of a Managed Service for Kubernetes node group either in --location or --network-interface.

To grant internet access to Managed Service for Kubernetes cluster nodes, do one of the following:

Assign a public IP address to the cluster nodes, specifying --network-interface ipv4-address=nat or --network-interface ipv6-address=nat.
Enable access to Managed Service for Kubernetes nodes from the internet after creating a node group.

Error connecting to a cluster using kubectl

Error message:

ERROR: cluster has empty endpoint

The error occurs if you try to connect to a cluster with no public IP address and get kubectl credentials for a public IP address using this command:

yc managed-kubernetes cluster \
   get-credentials <cluster_name_or_ID> \
   --external

To connect to the cluster's private IP address from a VM located in the same network, get kubectl credentials using this command:

yc managed-kubernetes cluster \
   get-credentials <cluster_name_or_ID> \
   --internal

If you need to connect to a cluster from the internet, recreate the cluster and assign it a public IP address.

Errors occur when connecting to a node over SSH

Error messages:

Permission denied (publickey,password)

Too many authentication failures

Errors occur when connecting to a Managed Service for Kubernetes node in the following cases:

No public SSH key is added to the Managed Service for Kubernetes node group metadata.

Solution: Update the Managed Service for Kubernetes node group keys.
An invalid public SSH key is added to the Managed Service for Kubernetes node group metadata.

Solution: Change the format of the public key file to the appropriate one and update the Managed Service for Kubernetes node group keys.
No private SSH key is added to an authentication agent (ssh-agent).

Solution: Add a private key by running the following command: ssh-add <path_to_private_key_file>.

How do I grant internet access to Managed Service for Kubernetes cluster nodes?

If Managed Service for Kubernetes cluster nodes have no access to the internet, the following error occurs when trying to connect to the internet:

Failed to pull image "cr.yandex/***": rpc error: code = Unknown desc = Error response from daemon: Gethttps://cr.yandex/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

There are several ways to grant internet access to Managed Service for Kubernetes cluster nodes:

Create and configure a NAT gateway or NAT instance. As a result, through static routing, traffic will be routed via the gateway or a separate VM instance with NAT features.
Assign a public IP address to a Managed Service for Kubernetes node group.

Note

If you assigned public IP addresses to the cluster nodes and then configured the NAT gateway or NAT instance, internet access via the public IP addresses will be disabled. For more information, see the Yandex Virtual Private Cloud documentation.

Why cannot I choose Docker as the container runtime environment?

There is no support for Docker as a container runtime environment in clusters with Kubernetes version 1.24 or higher. Only containerd is available.

Error when connecting a GitLab repository to Argo CD

Error message:

FATA[0000] rpc error: code = Unknown desc = error testing repository connectivity: authorization failed

This error occurs if access to GitLab over HTTP(S) is disabled.

Solution: Enable access. To do this:

In GitLab, on the left-hand panel, select Admin → Settings → General.
Under Visibility and access controls, find the Enabled Git access protocols setting.
In the list, select the item which allows access over HTTP(S).

For more information, see the GitLab documentation.

Troubleshooting in Managed Service for Kubernetes

Error creating a cluster in a different folder's cloud networkError creating a cluster in a different folder's cloud network

A namespace has been deleted but its status is Terminating and its deletion is not completedA namespace has been deleted but its status is Terminating and its deletion is not completed

I am using Yandex Network Load Balancer alongside an Ingress controller. Why are some of my cluster's nodes UNHEALTHY?I am using Yandex Network Load Balancer alongside an Ingress controller. Why are some of my cluster's nodes UNHEALTHY?

Why is a created PersistentVolumeClaim still pending?Why is a created PersistentVolumeClaim still pending?

Why does my Managed Service for Kubernetes cluster fail to run after I change its node configuration?Why does my Managed Service for Kubernetes cluster fail to run after I change its node configuration?

An error occurs when renewing an Ingress controller certificateAn error occurs when renewing an Ingress controller certificate

Why is DNS name resolution not working in my cluster?Why is DNS name resolution not working in my cluster?

Check the version of your cluster and node groupsCheck the version of your cluster and node groups

Make sure that CoreDNS is up and runningMake sure that CoreDNS is up and running

Make sure the cluster has enough CPU resources availableMake sure the cluster has enough CPU resources available

Set up autoscalingSet up autoscaling

Set up local DNS cachingSet up local DNS caching

When creating a node group via the CLI, a parameter conflict occurs. How do I fix that?When creating a node group via the CLI, a parameter conflict occurs. How do I fix that?

Error connecting to a cluster using kubectlError connecting to a cluster using kubectl

Errors occur when connecting to a node over SSHErrors occur when connecting to a node over SSH

How do I grant internet access to Managed Service for Kubernetes cluster nodes?How do I grant internet access to Managed Service for Kubernetes cluster nodes?

Why cannot I choose Docker as the container runtime environment?Why cannot I choose Docker as the container runtime environment?

Error when connecting a GitLab repository to Argo CDError when connecting a GitLab repository to Argo CD

Was the article helpful?

Error creating a cluster in a different folder's cloud network

A namespace has been deleted but its status is Terminating and its deletion is not completed

I am using Yandex Network Load Balancer alongside an Ingress controller. Why are some of my cluster's nodes UNHEALTHY?

Why is a created PersistentVolumeClaim still pending?

Why does my Managed Service for Kubernetes cluster fail to run after I change its node configuration?

An error occurs when renewing an Ingress controller certificate

Why is DNS name resolution not working in my cluster?

Check the version of your cluster and node groups

Make sure that CoreDNS is up and running

Make sure the cluster has enough CPU resources available

Set up autoscaling

Set up local DNS caching

When creating a node group via the CLI, a parameter conflict occurs. How do I fix that?

Error connecting to a cluster using kubectl

Errors occur when connecting to a node over SSH

How do I grant internet access to Managed Service for Kubernetes cluster nodes?

Why cannot I choose Docker as the container runtime environment?

Error when connecting a GitLab repository to Argo CD