Troubleshooting DNS name resolving issues in Managed Service for Kubernetes
Issue description
The Managed Service for Kubernetes cluster does not resolve FQDNs for either internal or external resources.
Solution
Check the Kubernetes version running on the master and worker nodes by running these commands:
yc managed-kubernetes cluster get $CLUSTER_ID | grep vers
yc managed-kubernetes node-group get $NODE_GROUP_ID | grep vers
Alert
If your cluster or node group version is outdated and missing from the list of supported versions (yc managed-kubernetes list-versions), update both before proceeding with the diagnostics.
If the cluster and node group are running a supported Kubernetes version, check whether CoreDNS works properly within the cluster.
To diagnose CoreDNS, you need to analyze the state of the cluster's system DNS pods using the kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide command.
Example of the kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide command output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-85fd96f799-2zzvw 1/1 Running 5 21d 10.96.138.252 cl1*****************-yxeg <none> <none>
coredns-85fd96f799-9lz6b 1/1 Running 3 20d 10.96.140.90 cl1*****************-icos <none> <none>
Check the statuses of the pods in the cluster. If any pod is not in the RUNNING status, use the kubectl logs -l k8s-app=kube-dns -n kube-system --all-containers=true command to check the system logs of all DNS pods in the cluster and find the source of the issues.
If the issue with CoreDNS persists, try one of the following solutions:
Typically, a cluster has two CoreDNS pods, unless it is a single-node cluster with one pod. You can increase the number of CoreDNS replicas by updating the CoreDNS deployment autoscaling configuration and specifying the linear parameter:
Example of kube-dns-autoscaler deployment (kubectl -n kube-system edit cm kube-dns-autoscaler)
apiVersion: v1
data:
linear: '{"coresPerReplica":256,"nodesPerReplica":16,"preventSinglePointFailure":true}' # < These are autoscaling settings.
kind: ConfigMap
metadata:
name: kube-dns-autoscaler
namespace: kube-system
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-dns-autoscaler
You can learn more about the scaling configuration from Kubernetes developer guides on this GitHub page
To reduce the load from DNS requests in a Managed Service for Kubernetes cluster, enable NodeLocal DNS Cache. If a Managed Service for Kubernetes cluster contains more than 50 nodes, use automatic DNS scaling.
When NodeLocal DNS Cache is enabled, a DaemonSetnode-local-dns pod) runs on each Managed Service for Kubernetes node. User pods now send requests to the agent running on their Managed Service for Kubernetes nodes.
If the agent's cache contains the request, the agent returns a direct response. Otherwise, the system creates a TCP connection to kube-dns ClusterIP. By default, the caching agent makes cache-miss requests to kube-dns for the cluster.local DNS zone of the Managed Service for Kubernetes cluster.
Install [NodeLocal DNS]
(https://yandex.cloud/en/marketplace/products/yc/node-local-dns) using Cloud Marketplace as described in this guide or manually by following this tutorial.
Tip
You can also reliably troubleshoot DNS issues in your cluster by installing NodeLocal DNS Cache from Yandex Cloud Marketplace following these guides:
If the issue persists
If the above actions did not help, create a support ticket
- Managed Service for Kubernetes cluster ID.
- Managed Service for Kubernetes cluster event log:
kubectl get eventsoutput. - Cluster DNS service log:
kubectl logs -l k8s-app=kube-dns -n kube-system --all-containers=trueoutput. - Examples of DNS resolution errors in the cluster with the date and time of each issue.