Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Managed Service for Kubernetes
  • Comparing with other Yandex Cloud services
  • Getting started
    • Resource relationships
    • Release channels and updates
    • Support for Kubernetes versions
    • Zones of control in Managed Service for Kubernetes
    • Updating node group OS
    • Encryption
    • Networking in Managed Service for Kubernetes
    • Network settings and cluster policies
    • Autoscaling
    • Audit policy
    • External cluster nodes
    • Quotas and limits
    • Recommendations on using Managed Service for Kubernetes
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • High availability and fault tolerance
  • Load scaling
  • Network load balancer
  • Application load balancer
  • Isolating resources
  • Resource Quota
  • Monitoring and escalation
  1. Concepts
  2. Recommendations on using Managed Service for Kubernetes

Recommendations on using Managed Service for Kubernetes

Written by
Yandex Cloud
Updated at February 24, 2026
  • High availability and fault tolerance
  • Load scaling
    • Network load balancer
    • Application load balancer
  • Isolating resources
    • Resource Quota
  • Monitoring and escalation

Use these recommendations for your PRODUCTION applications that require:

  • High availability and fault tolerance.
  • Load scaling.
  • Resource isolation.

Tip

Test the strategies below in a test environment before implementing them in PRODUCTION.

High availability and fault toleranceHigh availability and fault tolerance

  • Use the REGULAR or STABLE release channel.

    Tip

    Use the RAPID release channel for test environments to test Kubernetes and Managed Service for Kubernetes updates more quickly.

  • Control cluster and node group updates. Either disable auto updates and perform them manually, or set the update time so that your applications are available during active usage hours.

  • Configure podDisruptionBudget policies to minimize service downtime during updates.

  • Select the highly available master type running across three zones. Kubernetes services will be available in the event of an availability zone level failure. The Managed Service for Kubernetes [Service Level Agreement] applies to the configuration with a highly available master running across three zones.

  • Allocate sufficient compute resources (CPUs, RAM) to the master and nodes.

  • Minimize or eliminate resubscription of resources on the nodes, especially of RAM.

  • Configure correct health checks for load balancers.

  • To make your cluster more robust, create node groups with autoscaling in multiple availability zones.

    Tip

    Managed Service for Kubernetes uses Yandex Compute Cloud VM groups as cluster node groups. See the description of instance groups during a zonal incident and our mitigation guidelines.

  • Deploy your Deployment and StatefulSet type services in multiple instances in different availability zones. Use the Pod Topology Constraints and AntiAffinity strategies to ensure high availability of services and efficient consumption of Kubernetes cluster resources.

    Use the label combinations below for all strategies:

    • topology.kubernetes.io/zone to keep the services available in the event of an availability zone failure.
    • kubernetes.io/hostname to keep the services available in the event of a cluster node failure.

    Warning

    Autoscaling resources in the event of an availability zone failure takes time. Always use these labels to distribute pods across different nodes and availability zones so that your applications work properly.

Load scalingLoad scaling

Use these recommendations if the load on your Managed Service for Kubernetes cluster is constantly increasing:

  • To reduce the load on the Kubernetes DNS, use NodeLocal DNS. If the cluster has over 50 nodes, use DNS autoscaling.
  • To reduce horizontal traffic within the cluster, use a network load balancer and the externalTrafficPolicy:Local rule where possible.
  • Consider node storage requirements in advance:
    • Check the disk limits for Yandex Compute Cloud.
    • Load test your disk subsystem in a test environment.
  • To reduce latency at high IOPS, use non-replicated disks.

Network load balancerNetwork load balancer

A network load balancer distributes incoming traffic across targets (VMs). A listener with a public IP address enables the load balancer to process internet traffic, while a listener with a private IP address handles internal traffic. The load balancer uses health checks to test the availability of targets.

Yandex Cloud implements the NLB Zone Shift mechanism, where you can mark the load balancer with a special flag. If there is a partial failure in the availability zone, which is undetected by health checks, Yandex Cloud support will disable the compromised zone for this load balancer.

To test your application in the event of an availability zone failure, check this scenario.

Learn more about network load balancers.

Application load balancerApplication load balancer

An application load balancer is based on the network load balancer, but it can route traffic to any private IP addresses, e.g., IP addresses of resources outside the cloud network. Traffic is routed through intermediate VMs acting as reverse proxies.

In an application load balancer, you can manually disable a partially failed availability zone.

Learn more about application load balancers.

Isolating resourcesIsolating resources

Use these recommendations for applications that share Kubernetes cluster resources.

Set up limits and requests for all the cluster services:

---
...
containers:
...
  resources:
    limits:
      cpu: 250m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 64Mi
...

Specify vCPU availability in thousandths and RAM in megabytes. The service will not exceed the vCPU and RAM limits specified in limits. Setting up requests allows you to autoscale cluster nodes.

To manage pod resources automatically, configure Kubernetes policies:

  • Quality of Service for Pods to create pods of different availability classes.
  • Limit Ranges to set limits at the namespace level.
  • Resource Quotas to limit overall resource consumption in a namespace.

Resource QuotaResource Quota

Use the ResourceQuota policy to limit the resources that can be used within a single namespace:

---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: namespace-quota
  namespace: my-namespace
spec:
  hard:
    # Computing resources
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    # Number of objects
    pods: "50"
    services: "10"
    secrets: "20"
    configmaps: "20"
    persistentvolumeclaims: "10"
    # Storage resources
    requests.storage: 100Gi

With ResourceQuota, you can set limits on:

Resource type Parameters Description
Computing requests.cpu, requests.memory, limits.cpu, limits.memory Total requests and vCPU and RAM limits for all pods in the namespace
Storage requests.storage, persistentvolumeclaims Total requested storage size and number of PVCs
Number of objects pods, services, secrets, configmaps, replicationcontrollers, deployments.apps, statefulsets.apps, jobs.batch, cronjobs.batch Maximum number of objects of each type
Advanced resources requests.nvidia.com/gpu, limits.nvidia.com/gpu GPU resources and other extended resources

Tip

Use ResourceQuota together with LimitRange: ResourceQuota limits the total resource consumption in the namespace, while LimitRange sets default values ​​and limits for individual containers.

Monitoring and escalationMonitoring and escalation

Monitoring and alerts are key tools for ensuring fault tolerance.

  • Set up metric monitoring and create alerts to track the status of your master, nodes, pods, and persistent volumes.
  • Configure escalation policies for alerts.

Was the article helpful?

Previous
Quotas and limits
Next
Access management
© 2026 Direct Cursus Technology L.L.C.