Temporarily disabling availability zones for Yandex Compute Cloud instance groups
In Compute Cloud, you can temporarily disable one or more availability zones for an instance group. This helps maintain your services available during events affecting the availability zone, such as testing, maintenance, outages, or incidents.
For example, disabling a zone for an instance group prevents gray failures in that zone, i.e., situations where health checks and monitoring tools show no failures, yet a portion of the actual traffic fails to reach your instances there.
Also, a good practice is to temporarily disable a zone during a zonal incident, so you can gradually reintroduce instances from the affected zone in a controlled manner. See Instance group in a temporarily disabled zone during an incident for details.
Warning
You cannot temporary disable zones for instance groups created by a Managed Service for Kubernetes cluster as its node groups.
You can re-enable a group’s availability zone at any time or set a timeout for automatic reactivation.
For more information, see Disabling and enabling availability zones for a Yandex Compute Cloud instance group.
When an availability zone is temporarily disabled, the instance group operates as follows:
- Instances in the disabled zone will not be updated during group updates.
- Instances in the disabled zone will not autoheal.
- When the group size is increased manually or automatically, new instances will be created only in enabled zones.
- Instances in the disabled zone can be manually stopped or deleted.
- Operations which may create, start, or update instances in the disabled zone will be completed only after the zone is re-enabled.
Updating an instance in the group
In a disabled zone, instances will not be updated until the zone is re-enabled. In other zones, instance updates will run normally. The disabled zone’s instances will get the target updates after the availability zone returns to its normal state.
Warning
To avoid losing all instances in the group during an update, instances from the disabled zone will count toward the max_unavailable limit set in the deployment policy. Therefore, to update group instances when a zone is disabled, increase the max_expansion value.
Autohealing an instance in the group
In a disabled zone, autohealing will no longer work, but other zones will remain unaffected:
- Instances deemed unhealthy based on their status in Compute Cloud will be recovered as normal without counting against any quota.
- Instances deemed unhealthy based on application health checks will be recovered within the
max_unavailablequota. In the deployment policy, this quota defines the maximum allowed number of unavailable instances during group updates. If autohealing is on, instances residing in the disabled zone are excluded from themax_unavailablequota, and in the other zones, the recovery will run normally.
Increasing instance group size
Autoscaling instance groups
Regardless of the autoscaling type, the number of instances in a disabled zone will not change. When a zone is disabled, new instances are created only in the remaining enabled zones, up to the max_size limit of the scaling policy.
In groups with zonal autoscaling, instances do not have to be evenly distributed across zones. The remaining enabled zones get new instances based on the load.
In groups with regional autoscaling, instances are always evenly distributed across zones. Disabling a zone may cause imbalance in this distribution.
After the zone is re-enabled, instances are automatically redistributed across all zones based on the group's autoscaling type.
Warning
If you are using a network or an L7 load balancer in combination with an instance group with autoscaling, before disabling a zone in the load balancer, first, disable this zone for the instance group. Otherwise, the instance group will continue creating instances in a zone which does not receive traffic.
Before enabling a zone in the load balancer, first, enable this zone for the instance group to distribute its instances across the zones.
Fixed-size instance groups
In fixed-size groups, instances are always distributed evenly across availability zones. This behavior persists even while a zone is disabled.
Enabled zones will immediately get new instances; the disabled zone will get new instances after reactivation.
Manually deleting and stopping instances
To maintain service health, you can delete instances in the disabled zone using the DeleteInstances REST API method for the InstanceGroup resource or the InstanceGroupService/DeleteInstances gRPC API call.
Also, you can stop instances in the disabled zone using the StopInstances REST API method for the InstanceGroup resource or the InstanceGroupService/StopInstances gRPC API call.
When the zone is re-enabled, the instance group will restart the stopped instances and create any missing ones. However, if you used createAnother in the DeleteInstances method, new instances will be created until the zone is enabled.
Completing instance group operations
Operations which may create, start, or update instances in the disabled zone will be completed only after the zone is re-enabled. These operations include creating, updating, and starting an instance group, as well as staged recreation and restart of instances within a group.
Operations which stop or delete instances in the disabled zone will run as normal. These operations include stopping or deleting an instance group as well as stopping and deleting instance within a group.
Instance group in a temporarily disabled zone during an incident
During a zonal incident, instance group behavior changes automatically. More stringent restrictions are imposed on actions you can take with instances in an affected zone, as opposed to when you disable a zone manually, which means disabling the zone will have no impact on the instance group.
Tip
During an incident, you can disable the affected zone in the instance group. This way, the restrictions on instance creation, startup, and update will continue to apply after the incident is over, so you can gradually reintroduce instances from the affected zone in a controlled manner.