Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Compute Cloud
  • Yandex Container Solution
    • All tutorials
    • Configuring time synchronization using NTP
    • Autoscaling an instance group to process messages from a queue
    • Updating an instance group under load
    • Deploying Remote Desktop Gateway
    • Getting started with Packer
    • Transferring logs from a VM to Yandex Cloud Logging
    • Building a VM image with infrastructure tools using Packer
    • Migrating data to Yandex Cloud using Hystax Acura
    • Fault protection with Hystax Acura
    • VM backups using Hystax Acura
    • Deploying a fault-tolerant architecture with preemptible VMs
    • Configuring a fault-tolerant architecture in Yandex Cloud
    • Creating a budget trigger that invokes a function to stop a VM
    • Creating triggers that invoke a function to stop a VM and send a Telegram notification
    • Creating a Python web application with Flask
    • Creating an SAP program in Yandex Cloud
    • Deploying a Minecraft server in Yandex Cloud
    • Automating image builds using Jenkins and Packer
    • Creating test VMs via GitLab CI
    • High-performance computing on preemptible VMs
    • Configuring an SFTP server based on CentOS 7
    • Deploying GlusterFS in high availability mode
    • Deploying GlusterFS in high performance mode
    • Backing up to Object Storage with Bacula
    • Building a CI/CD pipeline in GitLab using serverless products
    • Implementing a secure high-availability network infrastructure with a dedicated DMZ based on the Check Point NGFW
    • Cloud infrastructure segmentation with the Check Point next-generation firewall
    • Configuring a secure GRE tunnel over IPsec
    • Creating a bastion host
    • Implementing fault-tolerant scenarios for NAT VMs
    • Creating a tunnel between two subnets using OpenVPN Access Server
    • Creating an external table from a Object Storage bucket table using a configuration file
    • Setting up network connectivity between BareMetal and Virtual Private Cloud subnets
    • Working with snapshots in Managed Service for Kubernetes
    • Launching the DeepSeek-R1 language model in a Yandex Compute Cloud GPU cluster
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Get your cloud ready
  • Required paid resources
  • Configure the CLI profile
  • Set up the environment for deploying the resources
  • Deploy your resources
  • Enable the route-switcher module
  • Test the solution for performance and fault tolerance
  • Testing the system performance
  • Testing the system fault tolerance
  • How to delete the resources you created
  1. Tutorials
  2. Implementing fault-tolerant scenarios for NAT VMs

Implementing fault-tolerant use cases for network VMs

Written by
Yandex Cloud
Updated at May 7, 2025
  • Get your cloud ready
    • Required paid resources
  • Configure the CLI profile
  • Set up the environment for deploying the resources
  • Deploy your resources
  • Enable the route-switcher module
  • Test the solution for performance and fault tolerance
    • Testing the system performance
    • Testing the system fault tolerance
  • How to delete the resources you created

In Yandex Cloud, you can deploy a cloud infrastructure using network VMs that provide firewall protection, network security, and traffic routing. With static routing, traffic is routed from subnets to network VMs.

To ensure high availability, you can deploy multiple network VMs in different availability zones and set up auto switching of outgoing subnet traffic from one network VM to another using the route-switcher module.

This tutorial describes a use case when the route-switcher module provides fault tolerance of a NAT instance, a network VM with preset routing and IP address translation rules. NAT instances help provide internet access for VMs and other cloud resources hosted in Yandex Cloud.

In the flow chart used in this example, a NAT instance called NAT-A is the main VM instance for traffic to the internet, while NAT-B is a standby one.

Description of the scheme elements
Element name Description
NAT-A, NAT-B NAT instances that provide internet access to cloud resources by translating the resources' internal IP addresses to the NAT instances' public IPs.
VPC: demo Virtual Private Cloud network
private-a Subnet in the ru-central1-a availability zone for hosting resources that require internet access.
public-a, public-b Subnets in the ru-central1-a and ru-central1-b availability zones hosting the NAT instances.
public ip a, public ip b NAT instances’ public IP addresses.
NLB Internal network load balancer required for the route-switcher module to run; it checks whether the NAT instances are available by performing health checks on port TCP 22.

If NAT-A fails, the route-switcher will switch outgoing traffic over to NAT-B by changing the Next hop value to the NAT-B internal IP address in the subnet route table. After that, internet access will be provided through NAT-B.

As soon as NAT-A recovers, the route-switcher will reroute outgoing traffic through NAT-A by changing the Next hop value to the NAT-A instance internal IP address in the route table.

This tutorial will help you create a test infrastructure that shows how the route-switcher module works. The solution has the following basic elements:

  • nat-a: Main NAT instance.
  • nat-b: Standby NAT instance.
  • test-vm: VM within the infrastructure's internal perimeter that is going to have internet access through the respective NAT instance.
  • route-switcher-lb-...: Network load balancer required for the route-switcher module to run and used to check if the NAT instances are available.
  • route-switcher-...: Cloud function that switches outgoing traffic over to the standby NAT instance if the main one is down.

To deploy the test infrastructure and test the route-switcher:

  1. Get your cloud ready.
  2. Set up the environment.
  3. Deploy your resources.
  4. Enable the route-switcher module.
  5. Test the solution for performance and fault tolerance.

If you no longer need the resources you created, delete them.

Get your cloud readyGet your cloud ready

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resourcesRequired paid resources

The infrastructure support cost includes:

  • Fee for continuously running VMs (see Yandex Compute Cloud pricing).
  • Fee for using Network Load Balancer (see Yandex Network Load Balancer pricing).
  • Fee for IP addresses and outbound traffic (see Yandex Virtual Private Cloud pricing).
  • Fee for using the function (see Yandex Cloud Functions pricing).

Configure the CLI profileConfigure the CLI profile

  1. If you do not have the Yandex Cloud command line interface yet, install it and sign in as a user.

  2. Create a service account:

    Management console
    CLI
    API
    1. In the management console, select the folder where you want to create a service account.
    2. In the list of services, select Identity and Access Management.
    3. Click Create service account.
    4. Specify the service account name, e.g., sa-terraform.
    5. Click Create.

    The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

    Run the command below to create a service account, specifying the sa-terraform name:

    yc iam service-account create --name sa-terraform
    

    Where name is the service account name.

    Result:

    id: ajehr0to1g8b********
    folder_id: b1gv87ssvu49********
    created_at: "2023-06-20T09:03:11.665153755Z"
    name: sa-terraform
    

    To create a service account, use the ServiceAccountService/Create gRPC API call or the create REST API method for the ServiceAccount resource.

  3. Assign the service account the administrator role for the folder:

    Management console
    CLI
    API
    1. On the management console home page, select a folder.
    2. Navigate to the Access bindings tab.
    3. Find the sa-terraform account in the list and click .
    4. Click Edit roles.
    5. Click Add role in the dialog box that opens and select the admin role.

    Run this command:

    yc resource-manager folder add-access-binding <folder_ID> \
       --role admin \
       --subject serviceAccount:<service_account_ID>
    

    To assign a service account a role for a folder, use the setAccessBindings REST API method for the ServiceAccount resource or the ServiceAccountService/SetAccessBindings gRPC API call.

  4. Set up the CLI profile to run operations on behalf of the service account:

    CLI
    1. Create an authorized key for the service account and save it to the file:

      yc iam key create \
      --service-account-id <service_account_ID> \
      --folder-id <service_account_folder_ID> \
      --output key.json
      

      Where:

      • service-account-id: Service account ID.
      • folder-id: ID of the service account folder.
      • output: Name of the authorized key file.

      Result:

      id: aje8nn871qo4********
      service_account_id: ajehr0to1g8b********
      created_at: "2023-06-20T09:16:43.479156798Z"
      key_algorithm: RSA_2048
      
    2. Create a CLI profile to run operations on behalf of the service account:

      yc config profile create sa-terraform
      

      Result:

      Profile 'sa-terraform' created and activated
      
    3. Configure the profile:

      yc config set service-account-key key.json
      

      Where:

      service-account-key: File with the service account authorized key.

    4. Add your credentials to the environment variables:

      export YC_TOKEN=$(yc iam create-token)
      

Set up the environment for deploying the resourcesSet up the environment for deploying the resources

  1. Install Terraform.

  2. Install Git using the following command:

    sudo apt install git
    
  3. Clone the yandex-cloud-examples/yc-route-switcher GitHub repository and go to the script folder:

    git clone https://github.com/yandex-cloud-examples/yc-route-switcher.git
    cd yc-route-switcher/examples
    
  4. Open the terraform.tfvars file, e.g., using the nano editor:

    nano terraform.tfvars
    
  5. Edit the following:

    1. String with the folder ID:

      folder_id = "<folder_ID>"
      
    2. String with a list of allowed public IP addresses for test-vm access:

      trusted_ip_for_mgmt = ["<workstation_external_IP_address>/32"]
      

      Where:
      <workstation_external_IP_address> is your workstation's public IP address.

      To find out the external IP address of your workstation, run:

      curl 2ip.ru
      

      Result:

      192.240.24.87
      

Deploy your resourcesDeploy your resources

  1. Initialize Terraform:

    terraform init
    
  2. Check the Terraform file configuration:

    terraform validate
    
  3. Check the list of cloud resources you want to create:

    terraform plan
    
  4. Create resources:

    terraform apply 
    
  5. Wait until the resources are deployed and save the resulting command output:

    Outputs:
    nat-a_public_ip_address = "***.***.129.139"
    nat-b_public_ip_address = "***.***.105.234"
    path_for_private_ssh_key = "./pt_key.pem"
    test_vm_password = <sensitive>
    vm_username = "admin"
    

Enable the route-switcher moduleEnable the route-switcher module

  1. Make sure the NAT instances are running and available within the network:

    Management console
    1. In the management console, select the appropriate folder.
    2. Select Network Load Balancer and go to the route-switcher-lb-... network load balancer page.
    3. Open the target group and make sure the target resources are Healthy.
  2. Open the route-switcher.tf file, e.g., using the nano editor:

    nano route-switcher.tf
    
  3. Change the value of the start_module parameter for the route-switcher module to true.

  4. Run the module with the following command:

    terraform apply 
    

    Within 5 minutes of resource deployment, the route-switcher module starts providing fault tolerance of outgoing traffic to the internet via the NAT instance.

Test the solution for performance and fault toleranceTest the solution for performance and fault tolerance

Testing the system performanceTesting the system performance

  1. Connect to the test-vm serial console:

    Management console
    1. In the management console, select the appropriate folder.
    2. Select Compute Cloud.
    3. In the VM list, select test-vm.
    4. Navigate to the Serial console tab.
    5. Wait for the operating system to start up completely.
  2. Enter the admin username and password.
    To find out the password, run the following command in your workstation's terraform scenario folder:

    terraform output test_vm_password
    
  3. Make sure test-vm is connected to the internet via the public IP address of nat-a. Run the following command in the serial console:

    curl ifconfig.co
    

    Compare the IP address with the nat-a_public_ip_address value from the resulting output.

  4. Enable outgoing traffic from the test VM to a resource on the internet using the ping command:

    ping ya.ru
    

    Make sure that packets are returned:

    PING ya.ru (77.88.55.242) 56(84) bytes of data.
    64 bytes from ya.ru (77.88.55.242): icmp_seq=1 ttl=56 time=4.67 ms
    64 bytes from ya.ru (77.88.55.242): icmp_seq=2 ttl=56 time=3.83 ms
    64 bytes from ya.ru (77.88.55.242): icmp_seq=3 ttl=56 time=3.80 ms
    64 bytes from ya.ru (77.88.55.242): icmp_seq=4 ttl=56 time=3.78 ms
    
  5. Make sure the Next hop value in the route table for the demo network matches the internal IP address of nat-a.

Testing the system fault toleranceTesting the system fault tolerance

  1. Disable the main NAT instance by emulating a system failure:

    Management console
    CLI
    API
    1. In the management console, select the appropriate folder.
    2. Select Compute Cloud.
    3. Select the nat-a VM from the list, click , and select Stop.
    4. In the window that opens, click Stop.
    1. See the description of the CLI command for stopping a VM:

      yc compute instance stop --help
      
    2. Stop the VM:

      yc compute instance stop nat-a
      

    Use the stop REST API method for the Instance resource or the InstanceService/Stop gRPC API call.

  2. Monitor the loss of packets sent by ping.
    After the main NAT instance is disabled, there may be a traffic loss for around one minute, and then the traffic should recover.

  3. Make sure internet access is now provided via the public IP address of nat-b. To do this, in the serial console, stop the ping command and run the following one:

    curl ifconfig.co
    

    Compare the IP address with the nat-b_public_ip_address value from the resulting output.

  4. Check that the route-switcher has changed the Next hop value in the route table for the demo network and it now matches the internal IP address of nat-b.

  5. Enable outgoing traffic from the test VM using the ping command.

  6. Run the main NAT instance by emulating system recovery:

    Management console
    CLI
    API
    1. In the management console, select the appropriate folder.
    2. Select Compute Cloud.
    3. Select the nat-a VM from the list, click , and select Stop.
    4. In the window that opens, click Start.
    1. See the description of the CLI command for stopping a VM:

      yc compute instance start --help
      
    2. Stop the VM:

      yc compute instance start nat-a
      

    Use the start REST API method for the Instance resource or the InstanceService/Start gRPC API call.

  7. Monitor the ping utility output. While the NAT-A instance is being recovered, there may be no loss of sent packets.

  8. Make sure internet access is provided via the public IP address of nat-a again. To do this, in the serial console, stop the ping command and run the following one:

    curl ifconfig.co
    

    Compare the IP address with the nat-a_public_ip_address value from the resulting output.

  9. Check that the route-switcher has changed the Next hop value in the route table for the demo network and it matches the internal IP address of nat-a again.

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created, run this command:

terraform destroy

Warning

Terraform will permanently delete all the resources: networks, subnets, VMs, load balancer, etc.

Was the article helpful?

Previous
Creating a bastion host
Next
Creating a tunnel between two subnets using OpenVPN Access Server
Yandex project
© 2025 Yandex.Cloud LLC