Horizontal scaling of an application in a cluster
Managed Service for Kubernetes supports several types of autoscaling. In this tutorial, you will learn to configure cluster autoscaling using a combination of Cluster Autoscaler and Horizontal Pod Autoscaler.
If you no longer need the resources you created, delete them.
Warning
While running, the total number of group nodes may increase to six. Make sure you have enough folder resources to follow the steps provided in this tutorial.
Required paid resources
The support cost for this solution includes:
- Fee for using the master and outgoing traffic in a Managed Service for Kubernetes cluster (see Managed Service for Kubernetes pricing).
- Fee for using computing resources, OS, and storage in cluster nodes (VMs) (see Compute Cloud pricing).
- Fee for a public IP address for cluster nodes (see Virtual Private Cloud pricing).
- Key Management Service fee for the number of active key versions (with
ActiveorScheduled For Destructionfor status) and completed cryptographic operations (see Key Management Service pricing).
Getting started
-
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the
yc config set folder-id <folder_ID>command. You can also set a different folder for any specific command using the--folder-nameor--folder-idparameter. -
Create these service accounts for the master and node groups, and assign them the listed roles:
sa-k8s-masterfor cluster management:k8s.clusters.agent: To manage a Kubernetes cluster.load-balancer.admin: To manage a network load balancer.
sa-k8s-nodesfor node group management:container-registry.images.puller: To pull images from Yandex Container Registry.
-
Create a network named
k8s-networkto host your cluster. Select Create subnets when creating it. -
Create security groups for the Managed Service for Kubernetes cluster and its node groups.
Warning
The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.
-
- Name:
k8s-symetric-key. - Encryption algorithm:
AES-128. - Rotation period, days:
365 days.
- Name:
-
Create a Managed Service for Kubernetes cluster with the following settings:
- Service account for resources:
sa-k8s-master. - Service account for nodes:
sa-k8s-nodes. - Encryption key:
k8s-symetric-key. - Release channel:
RAPID. - Public address:
Auto. - Type of master:
Highly available. - Cloud network:
k8s-network. - Security groups: Select the previously created security groups containing the rules for service traffic and Kubernetes API access.
- Enable tunnel mode:
Enabled.
- Service account for resources:
-
Create two node groups with the following settings in the
ru-central1-aandru-central1-bavailability zones:- Under Scaling:
- Type:
Automatic. - Minimum number of nodes:
1. - Maximum number of nodes:
3. - Initial number of nodes:
1.
- Type:
- Under Network settings:
- Public address:
Auto. - Security groups: Select the previously created security groups containing the rules for service traffic, internet access to services, and SSH access to nodes.
- Location:
ru-central1-aorru-central1-b.
- Public address:
- Under Scaling:
-
Install kubect
and configure it to work with the new cluster.
Scaling based on CPU utilization
In this section, you will learn how to configure cluster autoscaling based on CPU load.
-
Create a file named
k8s-autoscale-CPU.yamlthat contains the configuration for the test application, load balancer, and Horizontal Pod Autoscaler:k8s-autoscale-CPU.yaml
--- ### Deployment apiVersion: apps/v1 kind: Deployment metadata: name: nginx labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: name: nginx labels: app: nginx spec: containers: - name: nginx image: registry.k8s.io/hpa-example resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "500Mi" cpu: "1" --- ### Service apiVersion: v1 kind: Service metadata: name: nginx spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: LoadBalancer --- ### HPA apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: nginx spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 maxReplicas: 10 targetCPUUtilizationPercentage: 20 -
Create the objects:
kubectl apply -f k8s-autoscale-CPU.yaml -
In a separate window, start Kubernetes component load monitoring:
watch kubectl get pod,svc,hpa,nodes -o wide -
Run the following command to simulate a workload:
URL=$(kubectl get service nginx -o json \ | jq -r '.status.loadBalancer.ingress[0].ip') && \ while true; do wget -q -O- http://$URL; doneTip
To increase the load and speed up the scenario, run multiple simulations in separate windows.
Note
Within a few minutes, Horizontal Pod Autoscaler will increase the number of pods on the nodes due to rising CPU usage. As soon as existing cluster resources become inadequate to satisfy the
requestsvalue, Cluster Autoscaler will scale up nodes in the groups. -
Stop simulating the workload. Over the next few minutes, the number of nodes and pods will drop back to the initial value.
Scaling based on the number of application requests
In this section, you will learn how to configure cluster autoscaling based on the number of application requests (Requests Per Second, RPS).
How it works
-
The ingress controller sends information on the number of application requests to Prometheus
. -
Prometheus generates and publishes the
nginx_ingress_controller_requests_per_secondmetric for the number of application requests per second.For creating this metric, the
values-prom.yamlPrometheus configuration file includes the following rule:rules: groups: - name: Ingress rules: - record: nginx_ingress_controller_requests_per_second expr: rate(nginx_ingress_controller_requests[2m]) -
Based on this metric, the autoscaling tools adjust the number of pods and nodes.
Installing objects
-
Clone the GitHub repository with the up-to-date configuration files:
git clone https://github.com/yandex-cloud-examples/yc-mk8s-autoscaling-solution.git && \ cd yc-mk8s-autoscaling-solution -
Add the Helm repositories with the ingress controller and Prometheus monitoring system:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx && \ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \ helm repo update -
Install the ingress controller:
helm upgrade \ --install rps ingress-nginx/ingress-nginx \ --values values-ingr.yaml -
Install Prometheus:
helm upgrade \ --install prometheus prometheus-community/prometheus \ --values values-prom.yaml -
Install the Prometheus adapter
that enables the autoscaling tools to retrieve metrics from Prometheus:helm upgrade \ --install prometheus-adapter prometheus-community/prometheus-adapter \ --values values-prom-ad.yaml -
Create a test application, Ingress rule, and Horizontal Pod Autoscaler:
kubectl apply -f k8s_autoscale-RPS.yamlOnce these objects are created, Prometheus will add a new metric called
nginx_ingress_controller_requests_per_second. Prometheus will start monitoring that metric only after traffic goes through the ingress controller. -
Send several test requests to the ingress controller:
URL=$(kubectl get service rps-ingress-nginx-controller -o json \ | jq -r '.status.loadBalancer.ingress[0].ip') && \ curl --header "Host: nginx.example.com" http://$URL -
Make sure the
nginx_ingress_controller_requests_per_secondmetric is available:kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq . | \ grep ingresses.networking.k8s.io/nginx_ingress_controller_requests_per_secondResult:
"name": "ingresses.networking.k8s.io/nginx_ingress_controller_requests_per_second",
Testing autoscaling
-
In a separate window, start Kubernetes component load monitoring:
watch kubectl get pod,svc,hpa,nodes -o wide -
Run the following command to simulate a workload:
URL=$(kubectl get service rps-ingress-nginx-controller -o json \ | jq -r '.status.loadBalancer.ingress[0].ip') && \ while true; do curl --header "Host: nginx.example.com" http://$URL; doneNote
Within a few minutes, Horizontal Pod Autoscaler will increase the number of pods on the nodes due to increasing number of application requests. As soon as existing cluster resources become inadequate to satisfy the
requestsvalue, Cluster Autoscaler will scale up nodes in the groups. -
Stop simulating the workload. Over the next few minutes, the number of nodes and pods will drop back to the initial value.
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
- Delete the Kubernetes cluster.
- If you used static public IP addresses to access your cluster or nodes, release and delete them.