Installing Kubernetes Node Remediation
Kubernetes Node Remediation
The solution consists of two controllers:
- Node Healthcheck Controller, which tracks failures.
- Self Node Remediation Controller: Transfers the workload from failed nodes and restores them.
Getting started
-
If you do not have the Yandex Cloud CLI yet, install and initialize it.
The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the
yc config set folder-id <folder_ID>command. You can also set a different folder for any specific command using the--folder-nameor--folder-idoptions. -
Make sure the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If a rule is missing, add it.
Warning
The configuration of security groups determines performance and availability of the cluster and the services and applications running in it.
-
Install kubect
and configure it to work with the new cluster.
Installation from Yandex Cloud Marketplace
- In the management console
, select a folder. - Go to Managed Service for Kubernetes.
- Click the name of your cluster and select the
Marketplace tab. - Under Application available for installation, select Kubernetes Node Remediation and click Go to install.
- Configure the application:
- Namespace: Create a new namespace, e.g.,
remediation-space. If you leave the default namespace, Kubernetes Node Remediation may work incorrectly. - Application name: Specify the application name.
- Namespace: Create a new namespace, e.g.,
- Click Install.
- Wait for the application to change its status to
Deployed. - Create the
NodeHealthCheckresource.
Installation using a Helm chart
-
Install Helm
v3.8.0 or higher. -
To install a Helm chart
with Kubernetes Node Remediation, run this command:helm pull oci://cr.yandex/yc-marketplace/yandex-cloud/medik8s/kubernetes-node-remediation/chart/kubernetes-node-remediation \ --version 1.0.1 \ --untar && \ helm install \ --namespace <namespace> \ --create-namespace \ kubernetes-node-remediation ./kubernetes-node-remediation/If you specify the default
namespace, Kubernetes Node Remediation may work incorrectly. We recommend specifying a value different from all the existing namespaces, e.g.,remediation-space.Note
If you are using a Helm version below 3.8.0, add the
export HELM_EXPERIMENTAL_OCI=1 && \string at the beginning of the command to enable Open Container Initiative (OCI) support in the Helm client.
Creating the NodeHealthCheck resource
-
Create a file named
NodeHealthCheckwith the resource description:apiVersion: remediation.medik8s.io/v1alpha1 kind: NodeHealthCheck metadata: name: nodehc-sample spec: minHealthy: 51% remediationTemplate: apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationTemplate name: self-node-remediation-automatic-strategy-template namespace: <application_namespace> selector: matchLabels: beta.kubernetes.io/os: linux unhealthyConditions: - duration: 60s status: "False" type: Ready - duration: 60s status: Unknown type: ReadyWhere:
-
spec.minHealthy: Minimum percentage of healthy nodes required to initiate recovery. -
spec.unhealthyConditions: List of node status conditions the controller uses to determine if the node is unhealthy.duration: Amount of time the condition must persist before the node recovery process begins.type: Condition type.status: Expected status for recognizing a node as unhealthy.
In the example shown, the NodeHealthCheck controller will initiate recovery if the
Readycondition type for a node indicates that it is unavailable or in a down state for 60 seconds.
-
-
Navigate to the directory with the file and run this command:
kubectl apply -f <file_name>