Kubernetes Node Remediation

Name: Kubernetes Node Remediation
Brand: Yandex Cloud

Updated February 12, 2026

Kubernetes Node Remediation is a tool comprising several Kubernetes operators that provide automatic recovery of failed Managed Service for Kubernetes cluster nodes and high availability for stateful workloads.

The solution features two controllers:

Node Healthcheck Controller, which tracks failures.
Self Node Remediation Controller, which moves the workload away from unhealthy nodes and restores them.

Deployment instructions

Install kubectl and configure it to work with your cluster.
Create a node group for Kubernetes Node Remediation.
Configure the application:
- Namespace: Create a new namespace, e.g., remediation-space. If you leave the default namespace, Kubernetes Node Remediation may work incorrectly.
- Application name: Specify the application name.
Click Install.
Wait for the application to change its status to Deployed.
Create the NodeHealthCheck resource:
1. Create a file with the NodeHealthCheck description:
```
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
  name: nodehc-sample
spec:
  minHealthy: 51%
  remediationTemplate:
    apiVersion: self-node-remediation.medik8s.io/v1alpha1
    kind: SelfNodeRemediationTemplate
    name: self-node-remediation-automatic-strategy-template
    namespace: <application_namespace>
  selector:
    matchLabels:
      beta.kubernetes.io/os: linux
  unhealthyConditions:
  - duration: 60s
    status: "False"
    type: Ready
  - duration: 60s
    status: Unknown
    type: Ready
```
  Where:
  - spec.minHealthy: Minimum percentage of healthy nodes required to initiate recovery.
  - spec.unhealthyConditions: List of node status conditions the controller uses to determine if the node is unhealthy.
    - duration: Time for a condition to persist before node recovery starts.
    - type: Condition type.
    - status: Expected status for recognizing a node as unhealthy.
  In the above example, the NodeHealthCheck controller will initiate recovery if the Ready condition type for a node indicates that it is unavailable or in a down state for 60 seconds.
  
  Learn more about resource fields.
2. Navigate to the directory with the file and run this command:
```
kubectl apply -f <file_name>
```

Use cases

Node failure detection.
Recovery of affected workload by the scheduler.
Remediation of nodes after failures.

Links

Kubernetes Node Remediation Documentation Node Healthcheck Controller Repository Self Node Remediation Controller Repository Managed Service for Kubernetes documentation

Technical support

Yandex Cloud technical support is available 24/7. The types of requests you can submit and the relevant response times depend on your pricing plan. You can switch to the paid support plan in the management console. You can learn more about the technical support terms and conditions here.

Product composition

Helm chart	Version	Pull-command	Documentation
yandex-cloud/medik8s/kubernetes-node-remediation/chart/kubernetes-node-remediation	1.0.1		Open

Docker image	Version	Pull-command
yandex-cloud/medik8s/kubernetes-node-remediation/self-node-remediation-operator1750332903325272502991409697328138047197170598960	v0.10.0-y
yandex-cloud/medik8s/kubernetes-node-remediation/node-healthcheck-operator1750332903325272502991409697328138047197170598960	v0.7.0
yandex-cloud/medik8s/kubernetes-node-remediation/kube-rbac-proxy1750332903325272502991409697328138047197170598960	v0.19.0

Terms

By using this product you agree to the Yandex Cloud Marketplace Terms of Service