Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Kubernetes
  • Comparing with other Yandex Cloud services
  • Getting started
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes
    • General questions
    • Data storage
    • Configuring and updating
    • Autoscaling
    • Resources
    • Monitoring and logs
    • Troubleshooting
    • All questions on one page
  1. FAQ
  2. Troubleshooting

Troubleshooting in Managed Service for Kubernetes

Written by
Yandex Cloud
Updated at November 27, 2025
  • Error creating a cluster in a different folder's cloud network

  • Namespace fails to delete and remains Terminating

  • I am using Yandex Network Load Balancer together with an ingress controller. Why are some of my cluster's nodes UNHEALTHY?

  • Why does the newly created PersistentVolumeClaim remain Pending?

  • Why does my Managed Service for Kubernetes cluster fail to start after I update its node configuration?

  • Error updating ingress controller certificate

  • Why is DNS resolution not working in my cluster?

  • Creating a node group with the CLI results in a parameter conflict. How do I fix it?

  • Error connecting to a cluster using kubectl

  • Errors connecting to a node over SSH

  • How do I provide internet access to my Managed Service for Kubernetes cluster nodes?

  • Why cannot I choose Docker as the container runtime?

  • Error connecting a GitLab repository to Argo CD

  • Traffic loss when deploying app updates in a cluster with Yandex Application Load Balancer

  • System time displayed incorrectly in the Linux console, as well as in container and Managed Service for Kubernetes cluster pod logs

  • What should I do if I deleted my Yandex Network Load Balancer or its target groups that were automatically created for a LoadBalancer service?

This section describes typical issues you may encounter while using Managed Service for Kubernetes and gives troubleshooting recommendations.

Error creating a cluster in a different folder's cloud networkError creating a cluster in a different folder's cloud network

Error message:

Permission denied

This error occurs when the resource service account has no required roles in the folder that contains the cloud network selected when creating the cluster.

To create a Managed Service for Kubernetes cluster in a cloud network of another folder, assign the resource service account the following roles in that folder:

  • vpc.privateAdmin
  • vpc.user

To use a public IP address, also assign the vpc.publicAdmin role.

Namespace fails to delete and remains TerminatingNamespace fails to delete and remains Terminating

This issue occurs when your namespace contains stuck resources that the namespace controller cannot delete.

To fix it, delete the stuck resources manually.

CLI

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

  1. Connect to the Managed Service for Kubernetes cluster.

  2. Get the list of resources remaining in the namespace:

    kubectl api-resources --verbs=list --namespaced --output=name \
      | xargs --max-args=1 kubectl get --show-kind \
      --ignore-not-found --namespace=<namespace>
    
  3. Delete the listed resources:

    kubectl delete <resource_type> <resource_name> --namespace=<namespace>
    

If the namespace is still in the Terminating status and cannot be deleted, delete it forcibly using finalizer:

  1. Run a local proxy to the Kubernetes API:

    kubectl proxy
    
  2. Delete the namespace:

    kubectl get namespace <namespace> --output=json \
      | jq '.spec = {"finalizers":[]}' > temp.json && \
    curl --insecure --header "Content-Type: application/json" \
      --request PUT --data-binary @temp.json \
      127.0.0.1:8001/api/v1/namespaces/<namespace>/finalize
    

We do not recommend deleting the namespace with the Terminating status using finalizer right away, as this may cause the stuck resources to remain in your Managed Service for Kubernetes cluster.

I am using Yandex Network Load Balancer together with an ingress controller. Why are some of my cluster's nodes UNHEALTHY?I am using Yandex Network Load Balancer together with an ingress controller. Why are some of my cluster's nodes UNHEALTHY?

This is normal behavior for a load balancer with External Traffic Policy: Local enabled. Only the Managed Service for Kubernetes nodes whose pods are ready to handle user traffic get the HEALTHY status. All other nodes are labeled as UNHEALTHY.

To check the policy type of a load balancer created using a LoadBalancer service, run this command:

kubectl describe svc <LoadBalancer_service_name> \
| grep 'External Traffic Policy'

For more information, see Parameters of a LoadBalancer service.

Why does the newly created PersistentVolumeClaim remain Pending?Why does the newly created PersistentVolumeClaim remain Pending?

This is normal for a PersistentVolumeClaim (PVC). The newly created PVC remains Pending until you create a pod that will use it.

To change the PVC status to Running:

  1. View the PVC details:

    kubectl describe pvc <PVC_name> \
      --namespace=<namespace>
    

    Where --namespace is the namespace containing the PVC.

    The waiting for first consumer to be created before binding message means that the PVC is awaiting pod creation.

  2. Create a pod for this PVC.

Why does my Managed Service for Kubernetes cluster fail to start after I update its node configuration?Why does my Managed Service for Kubernetes cluster fail to start after I update its node configuration?

Make sure the new configuration of Managed Service for Kubernetes nodes is within the quota:

CLI

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

To run diagnostics for your Managed Service for Kubernetes cluster nodes:

  1. Connect to the Managed Service for Kubernetes cluster.

  2. Check the state of Managed Service for Kubernetes nodes:

    yc managed-kubernetes cluster list-nodes <cluster_ID>
    

    A message saying that the limit of Managed Service for Kubernetes cluster resources has been exceeded appears in the first column of the command output. Here is an example:

    +--------------------------------+-----------------+------------------+-------------+--------------+
    |         CLOUD INSTANCE         | KUBERNETES NODE |     RESOURCES    |     DISK    |    STATUS    |
    +--------------------------------+-----------------+------------------+-------------+--------------+
    | fhmil14sdienhr5uh89no          |                 | 2 100% core(s),  | 64.0 GB hdd | PROVISIONING |
    | CREATING_INSTANCE              |                 | 4.0 GB of memory |             |              |
    | [RESOURCE_EXHAUSTED] The limit |                 |                  |             |              |
    | on total size of network-hdd   |                 |                  |             |              |
    | disks has exceeded.,           |                 |                  |             |              |
    | [RESOURCE_EXHAUSTED] The limit |                 |                  |             |              |
    | on total size of network-hdd   |                 |                  |             |              |
    | disks has exceeded.            |                 |                  |             |              |
    +--------------------------------+-----------------+------------------+-------------+--------------+
    

To start your Managed Service for Kubernetes cluster, increase the quotas.

After changing the node subnet mask in the cluster settings, the number of pods per node is not as expectedAfter changing the node subnet mask in the cluster settings, the number of pods per node is not as expected

Solution: Recreate the node group.

Error updating ingress controller certificateError updating ingress controller certificate

Error message:

ERROR controller-runtime.manager.controller.ingressgroup Reconciler error
{"name": "some-prod", "namespace": , "error": "rpc error: code = InvalidArgument
desc = Validation error:\nlistener_specs[1].tls.sni_handlers[2].handler.certificate_ids:
Number of elements must be less than or equal to 1"}

The error occurs if different certificates are specified for the same ingress controller handler.

Solution: Edit and apply the ingress controller specifications so that each handler has only one certificate.

Why is DNS resolution not working in my cluster?Why is DNS resolution not working in my cluster?

A Managed Service for Kubernetes cluster may fail to resolve internal and external DNS requests for several reasons. To fix the issue:

  1. Check the version of your Managed Service for Kubernetes cluster and node groups.
  2. Make sure CoreDNS is up and running.
  3. Make sure your Managed Service for Kubernetes cluster has enough CPU resources available.
  4. Set up autoscaling.
  5. Set up local DNS caching.
Check the version of your cluster and node groupsCheck the version of your cluster and node groups
  1. Get the list of current Kubernetes versions:

    yc managed-kubernetes list-versions
    
  2. Get the Managed Service for Kubernetes cluster version:

    yc managed-kubernetes cluster get <cluster_name_or_ID> | grep version:
    

    You can get the Managed Service for Kubernetes cluster ID and name with the list of clusters in the folder.

  3. Get the Managed Service for Kubernetes node group version:

    yc managed-kubernetes node-group get <node_group_name_or_ID> | grep version:
    

    You can get the Managed Service for Kubernetes node group ID and name with the list of node groups in the cluster.

  4. If the versions of your Managed Service for Kubernetes cluster and node groups are not on the list of current Kubernetes versions, upgrade them.

Make sure CoreDNS is up and runningMake sure CoreDNS is up and running

Get the list of CoreDNS pods and their statuses:

kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

Make sure all pods have the Running status.

Make sure your cluster has enough CPU resources availableMake sure your cluster has enough CPU resources available
  1. Navigate to the folder dashboard and select Managed Service for Kubernetes.
  2. Click the name of the Managed Service for Kubernetes cluster you need and select the Node manager tab.
  3. Go to the Nodes tab and click the name of any Managed Service for Kubernetes node.
  4. Navigate to the Monitoring tab.
  5. Make sure that, in the CPU, [cores] chart, the used CPU values have not reached the total available CPU values. Check this for each Managed Service for Kubernetes cluster node.
Set up autoscalingSet up autoscaling

Set up DNS autoscaling based on the Managed Service for Kubernetes cluster size.

Set up local DNS cachingSet up local DNS caching

Set up NodeLocal DNS Cache. For optimal settings, install NodeLocal DNS Cache from Yandex Cloud Marketplace.

Creating a node group with the CLI results in a parameter conflict. How do I fix it?Creating a node group with the CLI results in a parameter conflict. How do I fix it?

Check whether you are specifying the --location, --network-interface, and --public-ip parameters in the same command. Providing them together causes the following errors:

  • For the --location and --public-ip or --location and --network-interface pairs:

    ERROR: rpc error: code = InvalidArgument desc = Validation error:
    allocation_policy.locations[0].subnet_id: can't use "allocation_policy.locations[0].subnet_id" together with "node_template.network_interface_specs"
    
  • For the --network-interface and --public-ip pair:

    ERROR: flag --public-ip cannot be used together with --network-interface. Use '--network-interface' option 'nat' to get public address
    

Make sure you only provide one of the three parameters in a command. It is enough to specify the location of a Managed Service for Kubernetes node group either in --location or in --network-interface.

To grant internet access to Managed Service for Kubernetes cluster nodes, do one of the following:

  • Assign a public IP address to the cluster nodes, specifying --network-interface ipv4-address=nat or --network-interface ipv6-address=nat.
  • Enable access to Managed Service for Kubernetes nodes from the internet after creating a node group.

Error connecting to a cluster using kubectlError connecting to a cluster using kubectl

Error message:

ERROR: cluster has empty endpoint

This error occurs if you try to connect to a cluster with no public IP address and get kubectl credentials for a public IP address using this command:

yc managed-kubernetes cluster \
   get-credentials <cluster_name_or_ID> \
   --external

To connect to the cluster's private IP address from a VM in the same network, get kubectl credentials using this command:

yc managed-kubernetes cluster \
   get-credentials <cluster_name_or_ID> \
   --internal

If you need to connect to a cluster from the internet, recreate the cluster and assign it a public IP address.

Errors connecting to a node over SSHErrors connecting to a node over SSH

Error messages:

Permission denied (publickey,password)
Too many authentication failures

The following situations cause errors when connecting to a Managed Service for Kubernetes node:

  • No public SSH key is added to the Managed Service for Kubernetes node group metadata.

    Solution: Update the Managed Service for Kubernetes node group keys.

  • An invalid public SSH key is added to the Managed Service for Kubernetes node group metadata.

    Solution: Change the format of the public key file to the appropriate one and update the Managed Service for Kubernetes node group keys.

  • No private SSH key is added to an authentication agent (ssh-agent).

    Solution: Add a private key by running the ssh-add <path_to_private_key_file> command.

How do I provide internet access to my Managed Service for Kubernetes cluster nodes?How do I provide internet access to my Managed Service for Kubernetes cluster nodes?

If Managed Service for Kubernetes cluster nodes have no internet access, the following error occurs when trying to connect to the internet:

Failed to pull image "cr.yandex/***": rpc error: code = Unknown desc = Error response from daemon: Gethttps://cr.yandex/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

You can provide internet access to your Managed Service for Kubernetes cluster nodes in several ways:

  • Set up a NAT gateway or NAT instance. With static routing in place, traffic will go through a gateway or a separate NAT instance.
  • Assign a public IP address to your Managed Service for Kubernetes node group.

Note

If you assigned public IP addresses to the cluster nodes and then configured the NAT gateway or NAT instance, internet access via the public IP addresses will be disabled. For more information, see our Yandex Virtual Private Cloud article.

Why cannot I choose Docker as the container runtime?Why cannot I choose Docker as the container runtime?

Clusters running Kubernetes 1.24 or higher do not support the Docker container runtime. Containerd is the only available runtime.

Error connecting a GitLab repository to Argo CDError connecting a GitLab repository to Argo CD

Error message:

FATA[0000] rpc error: code = Unknown desc = error testing repository connectivity: authorization failed

This error occurs if access to GitLab over HTTP(S) is disabled.

Solution: Enable HTTP(S) access. To do this:

  1. In GitLab, in the left-hand panel, select Admin → Settings → General.
  2. Under Visibility and access controls, find the Enabled Git access protocols setting.
  3. In the list, select the item which allows access over HTTP(S).

For more information, see this GitLab guide.

Traffic loss when deploying app updates in a cluster with Yandex Application Load BalancerTraffic loss when deploying app updates in a cluster with Yandex Application Load Balancer

When your app traffic is managed by an Application Load Balancer and the load balancer's ingress controller traffic policy is set to externalTrafficPolicy: Local, the app processes requests on the same node they were delivered to by the load balancer. There is no traffic flow between nodes.

The default health check monitors the status of the node, not application. Therefore, Application Load Balancer traffic may go to a node where there is no application running. When you deploy a new app version in a cluster, the Application Load Balancer ingress controller requests the load balancer to update the backend group configuration. It takes at least 30 seconds to process the request, and the app may not receive any user traffic during that time.

To prevent this, we recommend setting up backend health checks on your Application Load Balancer. With health checks, the load balancer timely spots unavailable backends and reroutes traffic to healthy backends. Once the application is updated, traffic will again be distributed across all backends.

For more information, see Tips for configuring Yandex Application Load Balancer health checks and Annotations (metadata.annotations).

System time displayed incorrectly on nodes, as well as in container and Managed Service for Kubernetes cluster pod logsSystem time displayed incorrectly on nodes, as well as in container and Managed Service for Kubernetes cluster pod logs

Managed Service for Kubernetes cluster time may not match the time of other resources, such as VMs, if they use different time synchronization sources. For example, a Managed Service for Kubernetes cluster synchronizes with a time server (by default), whereas a VM synchronizes with a private or public NTP server.

Solution: Set up Managed Service for Kubernetes cluster time synchronization with your private NTP server. To do this:

  1. Specify the NTP server addresses in the DHCP settings of the master subnets.

    Management console
    CLI
    Terraform
    API
    1. Navigate to the folder dashboard and select Managed Service for Kubernetes.
    2. Click the name of the Kubernetes cluster.
    3. Under Master configuration, click the subnet name.
    4. Click Edit in the top-right corner.
    5. In the window that opens, expand the DHCP settings section.
    6. Click Add and specify the IP address of your NTP server.
    7. Click Save changes.

    If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

    By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

    1. See the description of the CLI command for updating subnet settings:

      yc vpc subnet update --help
      
    2. Run the subnet command, specifying the NTP server IP address in the --ntp-server parameter:

      yc vpc subnet update <subnet_ID> --ntp-server <server_address>
      

    Tip

    To find out the IDs of the subnets containing the cluster, get detailed information about the cluster.

    1. In the Terraform configuration file, change the cluster subnet description. Add the dhcp_options section (if missing) with the ntp_servers parameter specifying the IP address of your NTP server:

      ...
      resource "yandex_vpc_subnet" "lab-subnet-a" {
        ...
        v4_cidr_blocks = ["<IPv4_address>"]
        network_id     = "<network_ID>"
        ...
        dhcp_options {
          ntp_servers = ["<IPv4_address>"]
          ...
        }
      }
      ...
      

      For more information about the yandex_vpc_subnet settings, see this Terraform provider article.

    2. Apply the changes:

      1. In the terminal, go to the directory where you edited the configuration file.

      2. Make sure the configuration file is correct using this command:

        terraform validate
        

        If the configuration is correct, you will get this message:

        Success! The configuration is valid.
        
      3. Run this command:

        terraform plan
        

        You will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.

      4. Apply the changes:

        terraform apply
        
      5. Type yes and press Enter to confirm the changes.

      Terraform will update all required resources. You can check the subnet update using the management console or this CLI command:

      yc vpc subnet get <subnet_name>
      

    Use the update method for the Subnet resource and provide the following in the request:

    • NTP server IP address in the dhcpOptions.ntpServers parameter.
    • dhcpOptions.ntpServers parameter to update in the updateMask parameter.

    Tip

    To find out the IDs of the subnets containing the cluster, get detailed information about the cluster.

    Warning

    For a highly available master hosted across three availability zones, you need to update each of the three subnets.

  2. Enable connections from the cluster to NTP servers.

    Create a rule for outbound traffic in the cluster and node group security group:

    • Port range: 123. If your NTP server uses a port other than 123, specify that port.
    • Protocol: UDP.
    • Destination name: CIDR.
    • CIDR blocks: <NTP_server_IP_address>/32. For a master hosted across three availability zones, specify three sections: <NTP_server_IP_address_in_subnet1>/32, <NTP_server_IP_address_in_subnet2>/32, and <NTP_server_IP_address_in_subnet3>/32.
  3. Update the network settings in the cluster node group using one of the following methods:

    • Connect to each node in the group over SSH or via OS Login and run the sudo dhclient -v -r && sudo dhclient command.
    • Reboot the group nodes at any convenient time.

    Warning

    Updating network settings may cause the services within the cluster to become unavailable for a few minutes.

What should I do if I deleted my Yandex Network Load Balancer or its target groups that were automatically created for a LoadBalancer service?What should I do if I deleted my Yandex Network Load Balancer or its target groups that were automatically created for a LoadBalancer service?

You cannot manually restore a Network Load Balancer or its target groups. Recreate your LoadBalancer service. This will automatically create a load balancer and target groups.

Was the article helpful?

Previous
Monitoring and logs
Next
All questions on one page
© 2025 Direct Cursus Technology L.L.C.