Installing Kubernetes Node Remediation
Kubernetes Node Remediation
The solution features two controllers:
- Node Healthcheck Controller, which tracks failures.
- Self Node Remediation Controller, which moves the workload away from unhealthy nodes and restores them.
Getting started
-
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the
yc config set folder-id <folder_ID>command. You can also set a different folder for any specific command using the--folder-nameor--folder-idparameter. -
Make sure that the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If any rule is missing, add it.
Warning
The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.
-
Install kubect
and configure it to work with the new cluster.
Installation from Yandex Cloud Marketplace
- Navigate to the folder dashboard
and select Managed Service for Kubernetes. - Click the name of your cluster and select the
Marketplace tab. - Under Application available for installation, select Kubernetes Node Remediation and click Go to install.
- Configure the application:
- Namespace: Create a new namespace, e.g.,
remediation-space. If you leave the default namespace, Kubernetes Node Remediation may work incorrectly. - Application name: Specify the application name.
- Namespace: Create a new namespace, e.g.,
- Click Install.
- Wait for the application to change its status to
Deployed. - Create the
NodeHealthCheckresource.
Installation using a Helm chart
-
Install Helm
v3.8.0 or higher. -
To install a Helm chart
with Kubernetes Node Remediation, run this command:helm pull oci://cr.yandex/yc-marketplace/yandex-cloud/medik8s/kubernetes-node-remediation/chart/kubernetes-node-remediation \ --version 1.0.1 \ --untar && \ helm install \ --namespace <namespace> \ --create-namespace \ kubernetes-node-remediation ./kubernetes-node-remediation/If you specify the default
namespace, Kubernetes Node Remediation may work incorrectly. We recommend specifying a value different from all the existing namespaces, e.g.,remediation-space.Note
If you are using a Helm version below 3.8.0, append the
export HELM_EXPERIMENTAL_OCI=1 && \string to the command to enable Open Container Initiative (OCI) support in the Helm client.
Creating the NodeHealthCheck resource
-
Create a file with the
NodeHealthCheckdescription:apiVersion: remediation.medik8s.io/v1alpha1 kind: NodeHealthCheck metadata: name: nodehc-sample spec: minHealthy: 51% remediationTemplate: apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationTemplate name: self-node-remediation-automatic-strategy-template namespace: <application_namespace> selector: matchLabels: beta.kubernetes.io/os: linux unhealthyConditions: - duration: 60s status: "False" type: Ready - duration: 60s status: Unknown type: ReadyWhere:
-
spec.minHealthy: Minimum percentage of healthy nodes required to initiate recovery. -
spec.unhealthyConditions: List of node status conditions the controller uses to determine if the node is unhealthy.duration: Time for a condition to persist before node recovery starts.type: Condition type.status: Expected status for recognizing a node as unhealthy.
In the above example, the NodeHealthCheck controller will initiate recovery if the
Readycondition type for a node indicates that it is unavailable or in a down state for 60 seconds.
-
-
Navigate to the directory with the file and run this command:
kubectl apply -f <file_name>