Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Kubernetes
  • Comparison with other Yandex Cloud services
  • Getting started
    • All tutorials
    • Creating a new Kubernetes project in Yandex Cloud
    • Creating a Kubernetes cluster with no internet access
    • Running workloads with GPUs
    • Using node groups with GPUs and no pre-installed drivers
    • Setting up Time-Slicing GPUs
    • Migrating resources to a different availability zone
    • Using Yandex Cloud modules in Terraform
    • Encrypting secrets in Managed Service for Kubernetes
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Getting started
  • Migrate the node group and the pod workloads to a different availability zone
  • Getting started
  • Migrating a stateless workload
  • Migrating a stateful workload
  • Gradually migrating a stateless and stateful workload
  1. Tutorials
  2. Migrating resources to a different availability zone

Migrating Kubernetes resources to a different availability zone

Written by
Yandex Cloud
Updated at May 5, 2025
  • Getting started
  • Migrate the node group and the pod workloads to a different availability zone
    • Getting started
    • Migrating a stateless workload
    • Migrating a stateful workload
    • Gradually migrating a stateless and stateful workload

In a Managed Service for Kubernetes cluster, you can migrate a node group and workload in pods from one availability zone to another.

Getting startedGetting started

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

If you have already installed the CLI, update to its latest version.

yc components update

Migrate the node group and the pod workloads to a different availability zoneMigrate the node group and the pod workloads to a different availability zone

Prepare a node group and proceed to migration using one of the following methods:

  • Migrating a node group directly to the new availability zone. It depends on the type of workload in the pods:

    • Stateless workload: The functioning of applications in the pods during migration depends on how the workload is distributed among the cluster nodes. If the pods reside both in the node group you are migrating and the groups for which the availability zone remains the same, the applications will continue to run. If the pods only reside in the group you are migrating, both the pods and the applications in them will have to be stopped for a short while.

      Examples of stateless workloads include the web server, Yandex Application Load Balancer Ingress controller, and REST API applications.

    • Stateful workloads: The pods and applications will have to be stopped for a short while, regardless of how the workload is distributed among the cluster nodes.

      Examples of stateful workloads include databases and storages.

  • Gradually migrating a stateless and stateful workload to the new node group. It involves creating a new node group in the new availability zone and gradually discontinuing the old nodes. This way, you can monitor the workload transfer.

Getting startedGetting started

  1. Check if the nodeSelector, affinity, or topology spread constraints strategies are used to assign the pods to the group's nodes. For more information on strategies, see Kubernetes documentation and the High availability and fault tolerance. To check how a pod is assigned to its associated node and unlink them:

    Management console
    1. In the management console, select the folder with your Managed Service for Kubernetes cluster.

    2. From the list of services, select Managed Service for Kubernetes.

    3. Go to the cluster page and find the Workload section.

    4. On the Pods tab, open the pod's page.

    5. Navigate to the YAML tab.

    6. Check if the pod manifest contains the following parameters and Kubernetes labels in them:

      • Parameters:

        • spec.nodeSelector
        • spec.affinity
        • spec.topologySpreadConstraints
      • Kubernetes labels set in the parameters:

        • failure-domain.beta.kubernetes.io/zone: <availability_zone>
        • topology.kubernetes.io/zone: <availability_zone>

      If a configuration has at least one of these parameters and that parameter contains at least one of the listed Kubernetes labels, that configuration prevents node group and workload migration.

    7. Check if the pod manifest has any dependencies from:

      • Availability zone you are migrating resources from.
      • Specific nodes within the group.
    8. If you find any of these settings, assignments, or dependencies, remove them from the pod configuration:

      1. Copy the YAML configuration from the management console.

      2. Create a local YAML file and paste the configuration into it.

      3. Remove any availability zone assignments from the configuration. For example, if the spec.affinity parameter includes the failure-domain.beta.kubernetes.io/zone Kubernetes label, remove it.

      4. Apply the new configuration:

        kubectl apply -f <yaml_file_path>
        
      5. Make sure the pod status changed to Running:

        kubectl get pods
        
    9. Check and update the configuration of each pod by repeating these steps.

  2. Transfer persistent data, such as databases, message queues, monitoring and log servers, to the new availability zone.

Migrating a stateless workloadMigrating a stateless workload

  1. Create a subnet in the new availability zone and migrate the node group:

    CLI
    Terraform
    1. Create a subnet:

      yc vpc subnet create \
         --name <subnet_name> \
         --zone <availability_zone> \
         --network-id <network_ID> \
         --range <subnet_CIDR>
      

      Where:

      • --name: Subnet name.
      • --zone: Availability zone (ru-central1-a, ru-central1-b, or ru-central1-d).
      • --network-id: ID of the network the new subnet belongs to.
      • --range: List of IPv4 addresses for outgoing and incoming traffic, e.g., 10.0.0.0/22 or 192.168.0.0/16. Make sure the addresses are unique within the network. The minimum subnet size is /28, the maximum subnet size is /16. Only IPv4 is supported.
    2. Move the node group to the new availability zone. The example below shows a command for moving a group residing in a single zone:

      yc managed-kubernetes node-group update \
         --id <node_group_ID> \
         --location zone=<availability_zone>,subnet-id=<subnet_ID> \
         --network-interface subnets=<subnet_ID>,`
              `ipv4-address=nat,`
              `security-group-ids=[<security_group_IDs>]
      

      Where:

      • id: ID of the node group to move to a different availability zone.
      • zone: Availability zone you want to move your node group to (ru-central1-a, ru-central1-b, or ru-central1-d).
      • subnet-id and subnets: ID of the new subnet you created earlier.
      • ipv4-address: Method of assigning an IPv4 address. The nat value allows assigning public and internal IP addresses to nodes.
      • security-group-ids: List of security group IDs.

      Warning

      If you want to keep the values of other network parameters for the node group, specify them in the network-interface parameter as well. Otherwise, the group can be recreated with default values. For more information, see Updating a Managed Service for Kubernetes node group.

      It is important to provide the ipv4-address and security-group-ids parameters in the command: this will assign public IP addresses to the node group and keep security groups within it.

      This command recreates the nodes within their group in the specified availability zone and subnet. When recreating, the deployment settings are considered: the maximum number of nodes by which you can increase or decrease the group size versus the original node count.

      How to move a node group residing in different availability zones

      In this case, use the following command:

      yc managed-kubernetes node-group update \
         --id <node_group_ID> \
         --location zone=<availability_zone>,subnet-id=<subnet_ID> \
         ...
         --location zone=<availability_zone>,subnet-id=<subnet_ID> \
         --network-interface subnets=[<subnet_IDs>],`
              `ipv4-address=nat,`
              `security-group-ids=[<security_group_IDs>]
      

      Where:

      • id: ID of the node group to move to a different availability zone.
      • zone: Availability zone (ru-central1-a, ru-central1-b, or ru-central1-d). Specify the location parameters for each availability zone that will host the node group.
      • subnet-id and subnets: IDs of the subnets for the specified availability zones.
      • ipv4-address: Method of assigning an IPv4 address. The nat value allows assigning public and internal IP addresses to nodes.
      • security-group-ids: List of security group IDs.

    Alert

    To make sure your node group is not recreated (unless it is your intention to do so), analyze the output of the terraform plan and terraform apply commands before applying the configuration.

    You can migrate a node group to a different availability zone without recreating it only if the configuration file contains the allocation_policy section.

    1. Make the following changes to the configuration file:

      • Add a new subnet manifest (yandex_vpc_subnet resource) in the availability zone to which you want to move the node group.
      • Change the node group location parameters (yandex_kubernetes_node_group resource):
        • allocation_policy.location.subnet_id: Remove this parameter if it is in the manifest.
        • allocation_policy.location.zone: Specify the availability zone you want to move the node group to.
        • instance_template.network_interface.subnet_ids: Specify the new subnet ID. Add this parameter if it is not in the manifest.
      resource "yandex_vpc_subnet" "my-new-subnet" {
         name           = "<subnet_name>"
         zone           = "<availability_zone>"
         network_id     = "<network_ID>"
         v4_cidr_blocks = ["<subnet_CIDR>"]
      }
      ...
      resource "yandex_kubernetes_node_group" "k8s-node-group" {
         allocation_policy {
            location {
               zone = "<availability_zone>"
            }
         }
         ...
         instance_template {
            network_interface {
               subnet_ids = [yandex_vpc_subnet.my-new-subnet.id]
               ...
            }
            ...
         }
         ...
      }
      

      Where:

      • name: Subnet name.
      • zone: Availability zone you want to move your node group to (ru-central1-a, ru-central1-b, or ru-central1-d).
      • network_id: ID of the network the new subnet belongs to.
      • v4_cidr_blocks: List of IPv4 addresses to deal with outgoing and incoming traffic, e.g., 10.0.0.0/22 or 192.168.0.0/16. Make sure the addresses are unique within the network. The minimum subnet size is /28, the maximum subnet size is /16. Only IPv4 is supported.
      • subnet_ids: ID of the new subnet.
    2. Check that the configuration file is correct.

      1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.

      2. Run this command:

        terraform validate
        

        Terraform will show any errors found in your configuration files.

    3. Confirm updating the resources.

      1. Run this command to view the planned changes:

        terraform plan
        

        If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.

      2. If everything looks correct, apply the changes:

        1. Run this command:

          terraform apply
          
        2. Confirm updating the resources.

        3. Wait for the operation to complete.

    The group nodes will be recreated in the specified availability zone and subnet. When recreating, the deployment settings are considered: the maximum number of nodes by which you can increase or decrease the group size versus the original node count.

  2. Make sure the pods are running in the migrated node group:

    kubectl get po --output wide
    

    The output of this command displays the nodes on which your pods are currently running.

Migrating a stateful workloadMigrating a stateful workload

The migration is based on scaling the StatefulSet controller. To migrate a stateful workload:

  1. Get a list of StatefulSet controllers to find the name of the one you need:

    kubectl get statefulsets
    
  2. Get the number of pods managed by the controller:

    kubectl get statefulsets <controller_name> \
       -n default -o=jsonpath='{.status.replicas}'
    

    Save the obtained value. You will need it to scale the StatefulSet controller once the migration of your stateful workload is complete.

  3. Reduce the number of pods to zero:

    kubectl scale statefulset <controller_name> --replicas=0
    

    This way, you will disable the pods that use disks. In addition, running this command saves a PersistentVolumeClaim (PVC) Kubernetes API object.

  4. For the PersistentVolume object (PV) associated with PersistentVolumeClaim, change the value of the persistentVolumeReclaimPolicy parameter from Delete to Retain to prevent accidental data loss.

    1. Get the name of the PersistentVolume object:

      kubectl get pv
      
    2. Edit the PersistentVolume object:

      kubectl edit pv <PV_name>
      
  5. Check if the PersistentVolume object manifest contains the spec.nodeAffinity parameter:

    kubectl get pv <PV_name> --output='yaml'
    

    If the manifest contains the spec.nodeAffinity parameter with an availability zone specified in it, save this parameter. You will need to specify it in a new PersistentVolume object.

  6. Create a snapshot representing a point-in-time copy of the PersistentVolume disk. For more information about snapshots, see the Kubernetes documentation.

    1. Get the name of the PersistentVolumeClaim object:

      kubectl get pvc
      
    2. Create a snapshot.yaml file with the snapshot manifest and specify the PersistentVolumeClaim name in it:

      apiVersion: snapshot.storage.k8s.io/v1
      kind: VolumeSnapshot
      metadata:
         name: new-snapshot-test-<number>
      spec:
         volumeSnapshotClassName: yc-csi-snapclass
         source:
            persistentVolumeClaimName: <PVC_name>
      

      If you are creating several snapshots for different PersistentVolumeClaim objects, specify the <number> (consecutive number) to make sure each snapshot gets a unique metadata.name value.

    3. Create a snapshot:

      kubectl apply -f snapshot.yaml
      
    4. Check that the snapshot has been created:

      kubectl get volumesnapshots.snapshot.storage.k8s.io
      
    5. Make sure the VolumeSnapshotContent Kubernetes API object has been created:

      kubectl get volumesnapshotcontents.snapshot.storage.k8s.io
      
  7. Get the snapshot ID:

    yc compute snapshot list
    
  8. Create a VM disk from the snapshot:

    yc compute disk create \
       --source-snapshot-id <snapshot_ID> \
       --zone <availability_zone>
    

    In the command, specify the availability zone to which the Managed Service for Kubernetes node group is being migrated.

    Save the following parameters from the command's output:

    • Disk ID from the id field.
    • Disk type from the type_id field.
    • Disk size from the size field.
  9. Create a PersistentVolume Kubernetes API object from the new disk:

    1. Create the persistent-volume.yaml file with the PersistentVolume manifest:

      apiVersion: v1
      kind: PersistentVolume
      metadata:
         name: new-pv-test-<number>
      spec:
         capacity:
            storage: <PersistentVolume_size>
         accessModes:
            - ReadWriteOnce
         csi:
            driver: disk-csi-driver.mks.ycloud.io
            fsType: ext4
            volumeHandle: <disk_ID>
         storageClassName: <disk_type>
      

      In the file, specify the parameters of the disk created from the snapshot:

      • spec.capacity.storage: Disk size.

      • spec.csi.volumeHandle: Disk ID.

      • spec.storageClassName: Disk type. Specify the type in accordance with the following table:

        Type of the disk created from the snapshot Disk type for the YAML file
        network-ssd yc-network-ssd
        network-ssd-nonreplicated yc-network-ssd-nonreplicated
        network-nvme yc-network-nvme
        network-hdd yc-network-hdd

      If you are creating several PersistentVolume objects, specify the <number> (consecutive number) to make sure each snapshot gets a unique metadata.name value.

      If you saved the spec.nodeAffinity parameter earlier, add it to the manifest and specify the availability zone to which the Managed Service for Kubernetes node group is being migrated. If you omit this parameter, the workload may run in a different availability zone where PersistentVolume is not available. This will result in a run error.

      Example of the spec.nodeAffinity parameter:

      spec:
         ...
         nodeAffinity:
            required:
               nodeSelectorTerms:
               - matchExpressions:
                  - key: failure-domain.beta.kubernetes.io/zone
                    operator: In
                    values:
                       - ru-central1-d
      
    2. Create the PersistentVolume object:

      kubectl apply -f persistent-volume.yaml
      
    3. Make sure PersistentVolume was created:

      kubectl get pv
      

      The command output will display the new-pv-test<number> object.

    4. If you specified the spec.nodeAffinity parameter in the manifest, make sure to apply it for the PersistentVolume object:

      Management console
      1. In the management console, select the folder with your Managed Service for Kubernetes cluster.
      2. From the list of services, select Managed Service for Kubernetes.
      3. Go to the cluster page and find the Storage section.
      4. On the Persistent Volumes tab, find the new-pv-test-<number> object and check the Availability zone field value. It must specify an availability zone. A dash means there is no assignment to an availability zone.
    5. If you omitted the spec.nodeAffinity parameter in the manifest, you can add it by editing the PersistentVolume object:

      kubectl edit pv new-pv-test-<number>
      
  10. Create a PersistentVolumeClaim object from the new PersistentVolume:

    1. Create the persistent-volume-claim.yaml file with the PersistentVolumeClaim manifest:

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
         name: <PVC_name>
      spec:
         accessModes:
            - ReadWriteOnce
         resources:
            requests:
               storage: <PV_size>
         storageClassName: <disk_type>
         volumeName: new-pv-test-<number>
      

      In the file, set the following parameters:

      • metadata.name: Name of the PersistentVolumeClaim object you used to create the snapshot. You can get this name by running the kubectl get pvc command.
      • spec.resources.requests.storage: PersistentVolume size, which matches the size of the created disk.
      • spec.storageClassName: PersistentVolume disk type, which matches the disk type of the new PersistentVolume.
      • spec.volumeName: Name of the PersistentVolume object used to create PersistentVolumeClaim from. You can get this name by running the kubectl get pv command.
    2. Delete the original PersistentVolumeClaim so you can replace it:

      kubectl delete pvc <PVC_name>
      
    3. Create the PersistentVolumeClaim object:

      kubectl apply -f persistent-volume-claim.yaml
      
    4. Make sure PersistentVolumeClaim was created:

      kubectl get pvc
      

      The command output will return the PersistentVolumeClaim size you specified in the YAML file.

  11. Create a subnet in the new availability zone and migrate the node group:

    CLI
    Terraform
    1. Create a subnet:

      yc vpc subnet create \
         --name <subnet_name> \
         --zone <availability_zone> \
         --network-id <network_ID> \
         --range <subnet_CIDR>
      

      Where:

      • --name: Subnet name.
      • --zone: Availability zone (ru-central1-a, ru-central1-b, or ru-central1-d).
      • --network-id: ID of the network the new subnet belongs to.
      • --range: List of IPv4 addresses for outgoing and incoming traffic, e.g., 10.0.0.0/22 or 192.168.0.0/16. Make sure the addresses are unique within the network. The minimum subnet size is /28, the maximum subnet size is /16. Only IPv4 is supported.
    2. Move the node group to the new availability zone. The example below shows a command for moving a group residing in a single zone:

      yc managed-kubernetes node-group update \
         --id <node_group_ID> \
         --location zone=<availability_zone>,subnet-id=<subnet_ID> \
         --network-interface subnets=<subnet_ID>,`
              `ipv4-address=nat,`
              `security-group-ids=[<security_group_IDs>]
      

      Where:

      • id: ID of the node group to move to a different availability zone.
      • zone: Availability zone you want to move your node group to (ru-central1-a, ru-central1-b, or ru-central1-d).
      • subnet-id and subnets: ID of the new subnet you created earlier.
      • ipv4-address: Method of assigning an IPv4 address. The nat value allows assigning public and internal IP addresses to nodes.
      • security-group-ids: List of security group IDs.

      Warning

      If you want to keep the values of other network parameters for the node group, specify them in the network-interface parameter as well. Otherwise, the group can be recreated with default values. For more information, see Updating a Managed Service for Kubernetes node group.

      It is important to provide the ipv4-address and security-group-ids parameters in the command: this will assign public IP addresses to the node group and keep security groups within it.

      This command recreates the nodes within their group in the specified availability zone and subnet. When recreating, the deployment settings are considered: the maximum number of nodes by which you can increase or decrease the group size versus the original node count.

      How to move a node group residing in different availability zones

      In this case, use the following command:

      yc managed-kubernetes node-group update \
         --id <node_group_ID> \
         --location zone=<availability_zone>,subnet-id=<subnet_ID> \
         ...
         --location zone=<availability_zone>,subnet-id=<subnet_ID> \
         --network-interface subnets=[<subnet_IDs>],`
              `ipv4-address=nat,`
              `security-group-ids=[<security_group_IDs>]
      

      Where:

      • id: ID of the node group to move to a different availability zone.
      • zone: Availability zone (ru-central1-a, ru-central1-b, or ru-central1-d). Specify the location parameters for each availability zone that will host the node group.
      • subnet-id and subnets: IDs of the subnets for the specified availability zones.
      • ipv4-address: Method of assigning an IPv4 address. The nat value allows assigning public and internal IP addresses to nodes.
      • security-group-ids: List of security group IDs.

    Alert

    To make sure your node group is not recreated (unless it is your intention to do so), analyze the output of the terraform plan and terraform apply commands before applying the configuration.

    You can migrate a node group to a different availability zone without recreating it only if the configuration file contains the allocation_policy section.

    1. Make the following changes to the configuration file:

      • Add a new subnet manifest (yandex_vpc_subnet resource) in the availability zone to which you want to move the node group.
      • Change the node group location parameters (yandex_kubernetes_node_group resource):
        • allocation_policy.location.subnet_id: Remove this parameter if it is in the manifest.
        • allocation_policy.location.zone: Specify the availability zone you want to move the node group to.
        • instance_template.network_interface.subnet_ids: Specify the new subnet ID. Add this parameter if it is not in the manifest.
      resource "yandex_vpc_subnet" "my-new-subnet" {
         name           = "<subnet_name>"
         zone           = "<availability_zone>"
         network_id     = "<network_ID>"
         v4_cidr_blocks = ["<subnet_CIDR>"]
      }
      ...
      resource "yandex_kubernetes_node_group" "k8s-node-group" {
         allocation_policy {
            location {
               zone = "<availability_zone>"
            }
         }
         ...
         instance_template {
            network_interface {
               subnet_ids = [yandex_vpc_subnet.my-new-subnet.id]
               ...
            }
            ...
         }
         ...
      }
      

      Where:

      • name: Subnet name.
      • zone: Availability zone you want to move your node group to (ru-central1-a, ru-central1-b, or ru-central1-d).
      • network_id: ID of the network the new subnet belongs to.
      • v4_cidr_blocks: List of IPv4 addresses to deal with outgoing and incoming traffic, e.g., 10.0.0.0/22 or 192.168.0.0/16. Make sure the addresses are unique within the network. The minimum subnet size is /28, the maximum subnet size is /16. Only IPv4 is supported.
      • subnet_ids: ID of the new subnet.
    2. Check that the configuration file is correct.

      1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.

      2. Run this command:

        terraform validate
        

        Terraform will show any errors found in your configuration files.

    3. Confirm updating the resources.

      1. Run this command to view the planned changes:

        terraform plan
        

        If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.

      2. If everything looks correct, apply the changes:

        1. Run this command:

          terraform apply
          
        2. Confirm updating the resources.

        3. Wait for the operation to complete.

    The group nodes will be recreated in the specified availability zone and subnet. When recreating, the deployment settings are considered: the maximum number of nodes by which you can increase or decrease the group size versus the original node count.

  12. Restore the original number of pods managed by the StatefulSet controller:

    kubectl scale statefulset <controller_name> --replicas=<number_of_pods>
    

    The pods will be launched in the migrated node group.

    In the command, specify the following parameters:

    • Name of the StatefulSet controller. You can get it by running the kubectl get statefulsets command.
    • Number of pods prior to scaling down the controller.
  13. Make sure the pods are running in the migrated node group:

    kubectl get po --output wide
    

    The output of this command displays the nodes on which your pods are currently running.

  14. Delete the unused PersistentVolume object, i.e., the one with the Released status.

    1. Get the name of the PersistentVolume object:

      kubectl get pv
      
    2. Delete the PersistentVolume object.

      kubectl delete pv <PV_name>
      

Gradually migrating a stateless and stateful workloadGradually migrating a stateless and stateful workload

See below how to gradually migrate a workload from the old node group to the new one. For instructions on migrating the PersistentVolume and PersistentVolumeClaim objects, see Migrating a stateful workload.

  1. Create a new Managed Service for Kubernetes node group in the new availability zone.

  2. Disable running new pods in the old node group:

    kubectl cordon -l yandex.cloud/node-group-id=<old_node_group_ID>
    
  3. For each node in the old node group, run the following command:

    kubectl drain <node_name> --ignore-daemonsets --delete-emptydir-data
    

    The pods will gradually move to the new node group.

  4. Make sure the pods are running in the new node group:

    kubectl get po --output wide
    
  5. Delete the old node group.

Was the article helpful?

Previous
Setting up Time-Slicing GPUs
Next
Using Yandex Cloud modules in Terraform
© 2025 Direct Cursus Technology L.L.C.