Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Kubernetes
  • Comparing with other Yandex Cloud services
  • Getting started
    • All tutorials
    • Creating a new Kubernetes project in Yandex Cloud
    • Creating a Kubernetes cluster with no internet access
    • Running workloads with GPUs
    • Using node groups with GPUs and no pre-installed drivers
    • Setting up time-slicing GPUs
    • Migrating resources to a different availability zone
    • Encrypting secrets in Managed Service for Kubernetes
    • Creating a Kubernetes cluster using the Yandex Cloud provider for the Kubernetes Cluster API
    • Accessing the Yandex Cloud API from a Managed Service for Kubernetes cluster using a workload identity federation
      • Horizontal scaling of an application in a cluster
      • Vertical scaling of an application in a cluster
      • Updating the Metrics Server parameters
      • Deploying and load testing a scalable gRPC service
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Required paid resources
  • Getting started
  • Scaling based on CPU utilization
  • Scaling based on the number of application requests
  • How it works
  • Installing objects
  • Testing autoscaling
  • Delete the resources you created
  1. Tutorials
  2. Setting up and testing scaling
  3. Horizontal scaling of an application in a cluster

Horizontal scaling of an application in a cluster

Written by
Yandex Cloud
Updated at November 21, 2025
  • Required paid resources
  • Getting started
  • Scaling based on CPU utilization
  • Scaling based on the number of application requests
    • How it works
    • Installing objects
    • Testing autoscaling
  • Delete the resources you created

Managed Service for Kubernetes supports several types of autoscaling. In this tutorial, you will learn to configure cluster autoscaling using a combination of Cluster Autoscaler and Horizontal Pod Autoscaler.

  • Scaling based on CPU utilization.
  • Scaling based on the number of application requests.

If you no longer need the resources you created, delete them.

Warning

While running, the total number of group nodes may increase to six. Make sure you have enough folder resources to follow the steps provided in this tutorial.

Required paid resourcesRequired paid resources

The support cost for this solution includes:

  • Fee for using the master and outgoing traffic in a Managed Service for Kubernetes cluster (see Managed Service for Kubernetes pricing).
  • Fee for using computing resources, OS, and storage in cluster nodes (VMs) (see Compute Cloud pricing).
  • Fee for a public IP address for cluster nodes (see Virtual Private Cloud pricing).
  • Key Management Service fee for the number of active key versions (with Active or Scheduled For Destruction for status) and completed cryptographic operations (see Key Management Service pricing).

Getting startedGetting started

  1. If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

    By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

  2. Install Helm.

  3. Create these service accounts for the master and node groups, and assign them the listed roles:

    • sa-k8s-master for cluster management:
      • k8s.clusters.agent: To manage a Kubernetes cluster.
      • load-balancer.admin: To manage a network load balancer.
    • sa-k8s-nodes for node group management:
      • container-registry.images.puller: To pull images from Yandex Container Registry.
  4. Create a network named k8s-network to host your cluster. Select Create subnets when creating it.

  5. Create security groups for the Managed Service for Kubernetes cluster and its node groups.

    Warning

    The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.

  6. Create an encryption key:

    • Name: k8s-symetric-key.
    • Encryption algorithm: AES-128.
    • Rotation period, days: 365 days.
  7. Create a Managed Service for Kubernetes cluster with the following settings:

    • Service account for resources: sa-k8s-master.
    • Service account for nodes: sa-k8s-nodes.
    • Encryption key: k8s-symetric-key.
    • Release channel: RAPID.
    • Public address: Auto.
    • Type of master: Highly available.
    • Cloud network: k8s-network.
    • Security groups: Select the previously created security groups containing the rules for service traffic and Kubernetes API access.
    • Enable tunnel mode: Enabled.
  8. Create two node groups with the following settings in the ru-central1-a and ru-central1-b availability zones:

    • Under Scaling:
      • Type: Automatic.
      • Minimum number of nodes: 1.
      • Maximum number of nodes: 3.
      • Initial number of nodes: 1.
    • Under Network settings:
      • Public address: Auto.
      • Security groups: Select the previously created security groups containing the rules for service traffic, internet access to services, and SSH access to nodes.
      • Location: ru-central1-a or ru-central1-b.
  9. Install kubect and configure it to work with the new cluster.

Scaling based on CPU utilizationScaling based on CPU utilization

In this section, you will learn how to configure cluster autoscaling based on CPU load.

  1. Create a file named k8s-autoscale-CPU.yaml that contains the configuration for the test application, load balancer, and Horizontal Pod Autoscaler:

    k8s-autoscale-CPU.yaml
    ---
    ### Deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx
        spec:
          containers:
            - name: nginx
              image: registry.k8s.io/hpa-example
              resources:
                requests:
                  memory: "256Mi"
                  cpu: "500m"
                limits:
                  memory: "500Mi"
                  cpu: "1"
    ---
    ### Service
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx
    spec:
      selector:
        app: nginx
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
      type: LoadBalancer
    ---
    ### HPA
    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilizationPercentage: 20
    
  2. Create the objects:

    kubectl apply -f k8s-autoscale-CPU.yaml
    
  3. In a separate window, start Kubernetes component load monitoring:

    watch kubectl get pod,svc,hpa,nodes -o wide
    
  4. Run the following command to simulate a workload:

    URL=$(kubectl get service nginx -o json \
      | jq -r '.status.loadBalancer.ingress[0].ip') && \
      while true; do wget -q -O- http://$URL; done
    

    Tip

    To increase the load and speed up the scenario, run multiple simulations in separate windows.

    Note

    If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.

    Within a few minutes, Horizontal Pod Autoscaler will increase the number of pods on the nodes due to rising CPU usage. As soon as existing cluster resources become inadequate to satisfy the requests value, Cluster Autoscaler will scale up nodes in the groups.

  5. Stop simulating the workload. Over the next few minutes, the number of nodes and pods will drop back to the initial value.

Scaling based on the number of application requestsScaling based on the number of application requests

In this section, you will learn how to configure cluster autoscaling based on the number of application requests (Requests Per Second, RPS).

How it worksHow it works

  1. The ingress controller sends information on the number of application requests to Prometheus.

  2. Prometheus generates and publishes the nginx_ingress_controller_requests_per_second metric for the number of application requests per second.

    For creating this metric, the values-prom.yaml Prometheus configuration file includes the following rule:

    rules:
      groups:
        - name: Ingress
          rules:
            - record: nginx_ingress_controller_requests_per_second
              expr: rate(nginx_ingress_controller_requests[2m])
    
  3. Based on this metric, the autoscaling tools adjust the number of pods and nodes.

Installing objectsInstalling objects

  1. Clone the GitHub repository with the up-to-date configuration files:

    git clone https://github.com/yandex-cloud-examples/yc-mk8s-autoscaling-solution.git && \
    cd yc-mk8s-autoscaling-solution
    
  2. Add the Helm repositories with the ingress controller and Prometheus monitoring system:

    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx && \
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
    helm repo update
    
  3. Install the ingress controller:

    helm upgrade \
      --install rps ingress-nginx/ingress-nginx \
      --values values-ingr.yaml
    
  4. Install Prometheus:

    helm upgrade \
      --install prometheus prometheus-community/prometheus \
      --values values-prom.yaml
    
  5. Install the Prometheus adapter that enables the autoscaling tools to retrieve metrics from Prometheus:

    helm upgrade \
      --install prometheus-adapter prometheus-community/prometheus-adapter \
      --values values-prom-ad.yaml
    
  6. Create a test application, Ingress rule, and Horizontal Pod Autoscaler:

    kubectl apply -f k8s_autoscale-RPS.yaml
    

    Once these objects are created, Prometheus will add a new metric called nginx_ingress_controller_requests_per_second. Prometheus will start monitoring that metric only after traffic goes through the ingress controller.

  7. Send several test requests to the ingress controller:

    URL=$(kubectl get service rps-ingress-nginx-controller -o json \
      | jq -r '.status.loadBalancer.ingress[0].ip') && \
      curl --header "Host: nginx.example.com" http://$URL
    

    Note

    If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.

  8. Make sure the nginx_ingress_controller_requests_per_second metric is available:

    kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq . | \
      grep ingresses.networking.k8s.io/nginx_ingress_controller_requests_per_second
    

    Result:

    "name": "ingresses.networking.k8s.io/nginx_ingress_controller_requests_per_second",
    

Testing autoscalingTesting autoscaling

  1. In a separate window, start Kubernetes component load monitoring:

    watch kubectl get pod,svc,hpa,nodes -o wide
    
  2. Run the following command to simulate a workload:

    URL=$(kubectl get service rps-ingress-nginx-controller -o json \
      | jq -r '.status.loadBalancer.ingress[0].ip') && \
      while true; do curl --header "Host: nginx.example.com" http://$URL; done
    

    Note

    If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.

    Within a few minutes, Horizontal Pod Autoscaler will increase the number of pods on the nodes due to increasing number of application requests. As soon as existing cluster resources become inadequate to satisfy the requests value, Cluster Autoscaler will scale up nodes in the groups.

  3. Stop simulating the workload. Over the next few minutes, the number of nodes and pods will drop back to the initial value.

Delete the resources you createdDelete the resources you created

Delete the resources you no longer need to avoid paying for them:

  1. Delete the Kubernetes cluster.
  2. If you used static public IP addresses to access your cluster or nodes, release and delete them.

Was the article helpful?

Previous
Accessing the Yandex Cloud API from a Managed Service for Kubernetes cluster using a workload identity federation
Next
Vertical scaling of an application in a cluster
© 2025 Direct Cursus Technology L.L.C.