Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
    • Setting up a Managed Service for PostgreSQL connection from a container in Serverless Containers
    • Creating a VM from a Container Optimized Image
    • Creating a VM from a Container Optimized Image with an additional volume for a Docker container
    • Creating an instance group from a Container Optimized Image with multiple Docker containers
    • Creating an instance group from a Container Optimized Image
    • Creating a VM from a Container Optimized Image with multiple Docker containers
    • Updating a Container Optimized Image VM
    • Configuring data output from a Docker container to a serial port
      • Creating a new Kubernetes project
      • Creating a Kubernetes cluster with no internet access
      • Running workloads with GPUs
      • Using node groups with GPUs and no pre-installed drivers
      • Setting up Time-Slicing GPUs
      • Migrating resources to a different availability zone
      • Encrypting secrets
        • Horizontal application scaling in a cluster
        • Vertical application scaling in a cluster
        • Updating the Metrics Server parameters
        • Deploying and load testing a gRPC service with scaling

In this article:

  • Required paid resources
  • Getting started
  • Scaling based on CPU utilization
  • Scaling based on the number of application requests
  • Runtime algorithm
  • Installing objects
  • Testing autoscaling
  • Delete the resources you created
  1. Container infrastructure
  2. Managed Service for Kubernetes
  3. Setting up and testing scaling
  4. Horizontal application scaling in a cluster

Horizontal application scaling in a Yandex Managed Service for Kubernetes cluster

Written by
Yandex Cloud
Updated at May 13, 2025
  • Required paid resources
  • Getting started
  • Scaling based on CPU utilization
  • Scaling based on the number of application requests
    • Runtime algorithm
    • Installing objects
    • Testing autoscaling
  • Delete the resources you created

Managed Service for Kubernetes supports several types of autoscaling. In this article, you will learn to configure cluster autoscaling using a combination of Cluster Autoscaler and Horizontal Pod Autoscaler.

  • Scaling based on CPU utilization.
  • Scaling based on the number of application requests.

If you no longer need the resources you created, delete them.

Warning

While running, the total number of group nodes may increase to 6. Make sure you have sufficient folder resources to follow the steps provided in this tutorial.

Required paid resourcesRequired paid resources

The support cost includes:

  • Fee for using the master and outgoing traffic in a Managed Service for Kubernetes cluster (see Managed Service for Kubernetes pricing).
  • Fee for using computing resources, OS, and storage in cluster nodes (VMs) (see Compute Cloud pricing).
  • Fee for the public IP address for the cluster nodes (see Virtual Private Cloud pricing).
  • Key Management Service fee: number of active key versions (with Active or Scheduled For Destruction for status) and completed cryptographic operations (see Key Management Service pricing).

Getting startedGetting started

  1. If you do not have the Yandex Cloud (CLI) command line interface yet, install and initialize it.

    The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

  2. Install the Helm package manager.

  3. Create service accounts for the master and node groups and assign roles to them.

    • sa-k8s-master service account for cluster management.
      • k8s.clusters.agent: To manage a Kubernetes cluster.
      • load-balancer.admin: To manage a network load balancer.
    • sa-k8s-nodes service account for node group management:
      • container-registry.images.puller: For pulling images from Yandex Container Registry.
  4. Create a network named k8s-network to host your cluster. When creating your network, select the Create subnets option.

  5. Create security groups for the Managed Service for Kubernetes cluster and its node groups.

    Warning

    The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.

  6. Create an encryption key:

    • Name: k8s-symetric-key.
    • Encryption algorithm: AES-128.
    • Rotation period, days: 365 days.
  7. Create a Managed Service for Kubernetes cluster with the following settings:

    • Service account for resources: sa-k8s-master.
    • Service account for nodes: sa-k8s-nodes.
    • Encryption key: k8s-symetric-key.
    • Release channel: RAPID.
    • Public address: Auto.
    • Type of master: Highly available.
    • Cloud network: k8s-network.
    • Security groups: Select the previously created security groups containing the rules for service traffic and Kubernetes API access.
    • Enable tunnel mode: Enabled.
  8. Create two groups of nodes with the following settings in the ru-central1-a and the ru-central1-b availability zones:

    • Under Scaling:
      • Type: Automatic.
      • Minimum number of nodes: 1.
      • Maximum number of nodes: 3.
      • Initial number of nodes: 1.
    • Under Network settings:
      • Public address: Auto.
      • Security groups: Select the previously created security groups containing the rules for service traffic, connection to the services from the internet, and connection to nodes over SSH.
      • Location: ru-central1-a or ru-central1-b.
  9. Install kubect and configure it to work with the new cluster.

Scaling based on CPU utilizationScaling based on CPU utilization

In this section, you will learn to configure cluster autoscaling based on CPU load.

  1. Create a file named k8s-autoscale-CPU.yaml containing the settings for a test application, a load balancer, and Horizontal Pod Autoscaler:

    k8s-autoscale-CPU.yaml
    ---
    ### Deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          name: nginx
          labels:
            app: nginx
        spec:
          containers:
            - name: nginx
              image: registry.k8s.io/hpa-example
              resources:
                requests:
                  memory: "256Mi"
                  cpu: "500m"
                limits:
                  memory: "500Mi"
                  cpu: "1"
    ---
    ### Service
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx
    spec:
      selector:
        app: nginx
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
      type: LoadBalancer
    ---
    ### HPA
    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilizationPercentage: 20
    
  2. Create objects:

    kubectl apply -f k8s-autoscale-CPU.yaml
    
  3. In a separate window, launch Kubernetes component load tracking:

    watch kubectl get pod,svc,hpa,nodes -o wide
    
  4. Run a process to simulate a workload:

    URL=$(kubectl get service nginx -o json \
      | jq -r '.status.loadBalancer.ingress[0].ip') && \
      while true; do wget -q -O- http://$URL; done
    

    Tip

    To increase load and accelerate the execution of the scenario, run several processes in separate windows.

    Note

    If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.

    In the span of several minutes, Horizontal Pod Autoscaler will increase the number of pods on the nodes as a result of CPU usage. As soon as existing cluster resources become inadequate to satisfy the requests value, Cluster Autoscaler will increase the number of nodes in the groups.

  5. Stop simulating the workload. Over the next few minutes, the number of nodes and pods will drop back to the initial state.

Scaling based on the number of application requestsScaling based on the number of application requests

In this section, you will learn to configure cluster autoscaling based on the number of application requests (Requests Per Second, RPS).

Runtime algorithmRuntime algorithm

  1. An Ingress controller transmits information on the number of application requests to the Prometheus monitoring system.

  2. Prometheus generates and publishes the nginx_ingress_controller_requests_per_second metric for the number of application requests per second.

    To create this metric, the following rule has been added to the Prometheus configuration file called values-prom.yaml:

    rules:
      groups:
        - name: Ingress
          rules:
            - record: nginx_ingress_controller_requests_per_second
              expr: rate(nginx_ingress_controller_requests[2m])
    
  3. Based on this metric, the autoscaling tools update the number of pods and nodes.

Installing objectsInstalling objects

  1. Clone the GitHub repository containing the up-to-date configuration files:

    git clone https://github.com/yandex-cloud-examples/yc-mk8s-autoscaling-solution.git && \
    cd yc-mk8s-autoscaling-solution
    
  2. Add the Helm repositories with the Ingress controller and the Prometheus monitoring system:

    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx && \
    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
    helm repo update
    
  3. Install the Ingress controller:

    helm upgrade \
      --install rps ingress-nginx/ingress-nginx \
      --values values-ingr.yaml
    
  4. Install Prometheus:

    helm upgrade \
      --install prometheus prometheus-community/prometheus \
      --values values-prom.yaml
    
  5. Install a Prometheus adapter that will deliver Prometheus metrics to the autoscaling tools:

    helm upgrade \
      --install prometheus-adapter prometheus-community/prometheus-adapter \
      --values values-prom-ad.yaml
    
  6. Create a test application, an Ingress rule, and Horizontal Pod Autoscaler:

    kubectl apply -f k8s_autoscale-RPS.yaml
    

    Once objects are created, Prometheus will add a new metric called nginx_ingress_controller_requests_per_second. Prometheus will not start computing this metric until after traffic passes through the Ingress controller.

  7. Make several test requests to the Ingress controller:

    URL=$(kubectl get service rps-ingress-nginx-controller -o json \
      | jq -r '.status.loadBalancer.ingress[0].ip') && \
      curl --header "Host: nginx.example.com" http://$URL
    

    Note

    If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.

  8. Make sure that the nginx_ingress_controller_requests_per_second metric is available:

    kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq . | \
      grep ingresses.networking.k8s.io/nginx_ingress_controller_requests_per_second
    

    Result:

    "name": "ingresses.networking.k8s.io/nginx_ingress_controller_requests_per_second",
    

Testing autoscalingTesting autoscaling

  1. In a separate window, launch Kubernetes component load tracking:

    watch kubectl get pod,svc,hpa,nodes -o wide
    
  2. Run a process to simulate workload:

    URL=$(kubectl get service rps-ingress-nginx-controller -o json \
      | jq -r '.status.loadBalancer.ingress[0].ip') && \
      while true; do curl --header "Host: nginx.example.com" http://$URL; done
    

    Note

    If the resource is unavailable at the specified URL, make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.

    Over the next several minutes, Horizontal Pod Autoscaler will increase the number of pods on the nodes as a result of an increased number of application requests. As soon as existing cluster resources become inadequate to satisfy the requests value, Cluster Autoscaler will increase the number of nodes in the groups.

  3. Stop simulating the workload. Over the next few minutes, the number of nodes and pods will drop back to the initial state.

Delete the resources you createdDelete the resources you created

Delete the resources you no longer need to avoid paying for them:

  1. Delete the Kubernetes cluster.
  2. If static public IP addresses were used for cluster and node access, release and delete them.

Was the article helpful?

Previous
Encrypting secrets
Next
Vertical application scaling in a cluster
© 2025 Direct Cursus Technology L.L.C.