Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Architecture solutions
  • Recommendations on fault tolerance in Yandex Cloud
  • Deploying a web app in a fault-tolerant configuration in Yandex Cloud
    • Testing an Application Load Balancer-based infrastructure
    • Testing a Network Load Balancer-based infrastructure

In this article:

  • Goals of testing
  • Pre-test preparation
  • Test environment
  • Testing recommendations
  • Testing tools
  • Testing methodology
  • Preparation steps
  • Initiating the test
  • State assessment
  • Completing the test
  • Conclusion
  1. Fault tolerance testing in Yandex Cloud
  2. Testing an Application Load Balancer-based infrastructure

Fault tolerance testing in the Yandex Cloud infrastructure based on Yandex Application Load Balancer

Written by
Yandex Cloud
Updated at February 24, 2026
  • Goals of testing
  • Pre-test preparation
    • Test environment
    • Testing recommendations
    • Testing tools
  • Testing methodology
    • Preparation steps
    • Initiating the test
    • State assessment
    • Completing the test
  • Conclusion

This guide covers the practical aspects of the fault tolerance testing routine outlined in Recommendations on fault tolerance in Yandex Cloud, for the Yandex Cloud infrastructure based on L7 Application Load Balancer. It is assumed that the principles behind the subject infrastructure are aligned with the principles discussed in the article.

Goals of testingGoals of testing

This guide describes a cloud availability zone failure exercise methodology allowing you to:

  • Study the system's behavior during failure.
  • Evaluate the system’s fault tolerance when one of the availability zones fails.
  • Identify hidden dependencies and vulnerabilities.
  • Collect information on the symptoms of outage.
  • Check the system's ability to recover quickly.

The failure research is limited to the case of a complete failure of an availability zone. Partial failures fall outside the scope of this guide due to their diversity.

Pre-test preparationPre-test preparation

Test environmentTest environment

  1. Alignment with production environment:

    Warning

    We do not recommend using your production environment for testing; do a test environment exercise first.

    • We recommend making your test environment closely similar to the production environment in terms of configuration.
    • The test load should resemble the production workload. You can use load testing tools to simulate the production load, e.g., Yandex Load Testing.
    • We recommend using Infrastructure as Code to automate the setup of test environments.
  2. Follow these best practices to optimize costs when deploying resources in the test environment:

    • Use NRD disks instead of SSD-IO.
    • Use preemptible VMs.
    • Create your resources dynamically only for the duration of the test.
    • Free up resources automatically after the tests are over.
    • Use components without SLA to reduce costs.

Testing recommendationsTesting recommendations

  1. Use a monitoring system for assessment of test results.
  2. Save your test results for retrospective analysis.
  3. Perform testing on a regular basis.
  4. Use Yandex Cloud CLI 0.154.0 or higher for testing.

Testing toolsTesting tools

This guide describes fault tolerance tests implemented using tools that disable load balancing in a particular availability zone for Application Load Balancer.

We recommend using VPC security groups as an additional isolation tool for the disabled zone.

Important note: When using VPC security groups, consider the following specifics:

  • Security groups support allowing rules only; therefore, to block traffic you need a separate set of rules that allow traffic between zones. To implement blocking, these rules will have to be deleted.
  • By deleting the allowing rules from a security group you block new network connections without terminating the existing ones.

Testing methodologyTesting methodology

Preparation stepsPreparation steps

  1. If required, prepare the environment for testing.

  2. Select the availability zone to disable, i.e., to shift traffic away from, e.g., ru-central1-b.

  3. Determine the test duration. You can disable a load balancer zone either permanently or for a specified period, from 1 minute to 72 hours, e.g., 30 minutes.

  4. Get the list of load balancers that will participate in the testing:

    yc alb load-balancer list
    

Initiating the testInitiating the test

Disable delivery of traffic to the selected availability zone for each load balancer from the list. Use the disable-zones command to disable traffic balancing to the selected zone.

To disable traffic balancing in the ru-central1-b availability zone for a specific load balancer for 30 minutes, run this command:

yc alb load-balancer disable-zones <load_balancer_name_or_ID> \
  --zones=ru-central1-b \
  --duration 30m

Approximate result of executing the command (pay attention to allocation_policy.locations):

...
allocation_policy:
  locations:
    - zone_id: ru-central1-a
      subnet_id: e9bnvnn56fs4********
    - zone_id: ru-central1-b
      subnet_id: e2lqsms4cdl3********
      zonal_shift_active: true
      zonal_traffic_disabled: true
    - zone_id: ru-central1-d
      subnet_id: fl8dmq91iruu********
...

You can use this command to disable several availability zones at once if you list them separated by commas.

If you run the command again, the blocking period will be reset to 30 minutes from the current time.

If you do not specify the --duration parameter in the command, traffic balancing to the selected zones will be blocked indefinitely.

Warning

The disable-zones command only disables traffic balancing to the selected availability zone and only for the specified load balancer. This command does not impact network traffic within the zone or between the availability zones in any other cloud services. If you need to block traffic on such a broad scale, you can use VPC security groups on the corresponding cloud resource network interfaces.

State assessmentState assessment

  1. To get the resource blocking state info for an individual load balancer:

    Management console
    1. In the management console, select the folder with your load balancer.

    2. Go to Application Load Balancer and select the load balancer.

    3. Under Allocation, next to the availability zone, view its status.

      If the zonal shift duration has been set, you will see the end time next to the zone.

  2. Make sure traffic has stopped entering the selected zone. You can do this in the monitoring service by plotting total traffic on your virtual machines' interfaces grouped by availability zone.

    Currently, you cannot have zone-by-zone traffic distribution plotted through one simple request to the monitoring service. To get this done:

    1. Create a chart in the monitoring service.
    2. Create lists of VM IDs for the ru-central1-a zone, e.g., using this command:
      yc compute instance list --jq '[.[] | select(.zone_id=="ru-central1-a") | .id ] | join("|")'`
      
      The command output will be a single-line list of VM IDs separated by |. For example: fhm**********uv5|fhm**********aab|fhm**********ui1|....
    3. Add a query to the monitoring chart:
      alias(series_sum("network_received_packets"{folderId = "b1g**********", service = "compute", resource_type = "vm", resource_id = "<delimiter-separated_list_of_VM_IDs_from_previous_step_|>"}), "ru-central1-a")`
      
    4. Repeat steps 2 and 3 for zones ru-central1-b and ru-central1-d.
    5. Run the queries.

Completing the testCompleting the test

  1. To resume traffic balancing in a previously disabled availability zone, run this enable-zones command:

    yc alb load-balancer enable-zones <load_balancer_name_or_ID> \
      --zones=ru-central1-b
    
  2. Make sure that traffic has started flowing to the selected availability zone.

    Remember that there is time limit for re-disabling balancing after it is re-enabled. You have to wait for two minutes before you can disable balancing after it was re-enabled.

ConclusionConclusion

We recommend you to perform fault tolerance testing on a regular basis, document the results, and continuously improve your processes based on the experience you gain.

Was the article helpful?

Previous
Deploying a web app in a fault-tolerant configuration in Yandex Cloud
Next
Testing a Network Load Balancer-based infrastructure
© 2026 Direct Cursus Technology L.L.C.