Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI Studio
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Compute Cloud
  • Yandex Container Solution
    • All tutorials
    • Configuring time synchronization using NTP
    • Autoscaling an instance group to process messages from a queue
    • Updating an instance group under load
    • Deploying Remote Desktop Gateway
    • Getting started with Packer
    • Transferring logs from a VM to Yandex Cloud Logging
    • Building a VM image with infrastructure tools using Packer
    • Migrating data to Yandex Cloud using Hystax Acura
    • Fault protection with Hystax Acura
    • VM backups using Hystax Acura
    • Deploying a fault-tolerant architecture with preemptible VMs
    • Configuring a fault-tolerant architecture in Yandex Cloud
    • Creating a budget trigger that invokes a function to stop a VM
    • Creating triggers that invoke a function to stop a VM and send a Telegram notification
    • Creating a Python web application with Flask
    • Creating an SAP program in Yandex Cloud
    • Deploying a Minecraft server in Yandex Cloud
    • Automating image builds using Jenkins and Packer
    • Creating test VMs via GitLab CI
    • High-performance computing on preemptible VMs
    • Configuring an SFTP server based on CentOS 7
    • Deploying GlusterFS in high availability mode
    • Deploying GlusterFS in high performance mode
    • Backing up to Object Storage with Bacula
    • Building a CI/CD pipeline in GitLab using serverless products
    • Implementing a secure high-availability network infrastructure with a dedicated DMZ based on the Check Point NGFW
    • Cloud infrastructure segmentation with the Check Point next-generation firewall
    • Configuring a secure GRE tunnel over IPsec
    • Creating a bastion host
    • Implementing fault-tolerant scenarios for NAT VMs
    • Creating a tunnel between two subnets using OpenVPN Access Server
    • Creating an external table from a Object Storage bucket table using a configuration file
    • Setting up network connectivity between BareMetal and Virtual Private Cloud subnets
    • Working with snapshots in Managed Service for Kubernetes
    • Running the DeepSeek-R1 language model in a GPU cluster
    • Running a vLLM library with the Gemma 3 language model on a VM with GPU
    • Delivering USB devices to a virtual machine or BareMetal server
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Get your cloud ready
  • Required paid resources
  • Configure your CLI profile
  • Set up an environment for deploying the resources
  • Deploy your resources
  • Enable the route switcher
  • Test the solution for performance and fault tolerance
  • Testing the system performance
  • Testing the system fault tolerance
  • How to delete the resources you created
  1. Tutorials
  2. Implementing fault-tolerant scenarios for NAT VMs

Implementing fault-tolerant scenarios for NAT VMs

Written by
Yandex Cloud
Updated at June 9, 2025
  • Get your cloud ready
    • Required paid resources
  • Configure your CLI profile
  • Set up an environment for deploying the resources
  • Deploy your resources
  • Enable the route switcher
  • Test the solution for performance and fault tolerance
    • Testing the system performance
    • Testing the system fault tolerance
  • How to delete the resources you created

In Yandex Cloud, you can deploy a cloud infrastructure using network VMs that provide firewall protection, network security, and traffic routing. With static routing, you can route traffic from subnets to network VMs.

To ensure high availability, you can deploy multiple network VMs in different availability zones and use a route switcher to automatically switch outbound traffic between them.

In our scenario, the route switcher ensures fault tolerance of a NAT instance, a network VM with preset routing and IP address translation rules providing internet access for Yandex Cloud resources.

In the flow chart below, NAT-A is the main egress internet gateway, while NAT-B is a standby one.

Chart description
Element name Description
NAT-A, NAT-B NAT instances that enable internet access for cloud resources by translating the resources' private IP addresses to the NAT instances' public IP addresses.
VPC: demo Virtual Private Cloud network
private-a Subnet in the ru-central1-a availability zone, hosting resources that require internet access.
public-a, public-b Subnets in the ru-central1-a and ru-central1-b availability zones, hosting NAT instances
public ip a, public ip b NAT instance public IP addresses
NLB Internal network load balancer for the route switcher, performing NAT instance health checks by probing TCP port 22

If NAT-A fails, the route switcher will switch outbound traffic to NAT-B by changing its Next hop value to the NAT-B internal IP address in the route table. After that, NAT-B will provide internet access.

As soon as NAT-A recovers, the route switcher will change the Next hop value to the NAT-A internal IP address, thus rerouting outbound traffic through NAT-A.

In this tutorial, we will create a test infrastructure showing how a route switcher works. Our example will include the following basic components:

  • nat-a: Main NAT instance.
  • nat-b: Standby NAT instance.
  • test-vm: Internal VM accessing the internet through a NAT instance.
  • route-switcher-lb-...: Network load balancer for the route switcher, running health checks on the NAT instances.
  • route-switcher-...: Cloud function switching outbound traffic to the standby NAT instance if the main one is down.

To deploy the infrastructure and test your route switcher:

  1. Get your cloud ready.
  2. Set up your environment.
  3. Deploy your resources.
  4. Enable the route switcher.
  5. Test the solution for performance and fault tolerance.

If you no longer need the resources you created, delete them.

Get your cloud readyGet your cloud ready

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resourcesRequired paid resources

The infrastructure support cost includes:

  • Fee for continuously running VMs (see Yandex Compute Cloud pricing).
  • Fee for using Network Load Balancer (see Yandex Network Load Balancer pricing).
  • Fee for public IP addresses and outbound traffic (see Yandex Virtual Private Cloud pricing).
  • Fee for using the function (see Yandex Cloud Functions pricing).

Configure your CLI profileConfigure your CLI profile

  1. If you do not have the Yandex Cloud CLI yet, install it and sign in as a user.

  2. Create a service account:

    Management console
    CLI
    API
    1. In the management console, select the folder where you want to create a service account.
    2. In the list of services, select Identity and Access Management.
    3. Click Create service account.
    4. Enter a name for the service account, e.g., sa-terraform.
    5. Click Create.

    By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

    To create a service account, run the command below, specifying sa-terraform as the service account name:

    yc iam service-account create --name sa-terraform
    

    Where name is the service account name.

    Result:

    id: ajehr0to1g8b********
    folder_id: b1gv87ssvu49********
    created_at: "2023-06-20T09:03:11.665153755Z"
    name: sa-terraform
    

    To create a service account, use the ServiceAccountService/Create gRPC API call or the create REST API method for the ServiceAccount resource.

  3. Assign the admin role for the folder to the service account:

    Management console
    CLI
    API
    1. On the management console home page, select a folder.
    2. Navigate to the Access bindings tab.
    3. Find the sa-terraform account in the list and click .
    4. Click Edit roles.
    5. In the dialog that opens, click Add role and select admin.

    Run this command:

    yc resource-manager folder add-access-binding <folder_ID> \
       --role admin \
       --subject serviceAccount:<service_account_ID>
    

    To assign a role for a folder to a service account, use the setAccessBindings REST API method for the ServiceAccount resource or the ServiceAccountService/SetAccessBindings gRPC API call.

  4. Set up the CLI profile to run operations under the service account:

    CLI
    1. Create an authorized key for the service account and save it to the file:

      yc iam key create \
      --service-account-id <service_account_ID> \
      --folder-id <ID_of_folder_with_service_account> \
      --output key.json
      

      Where:

      • service-account-id: Service account ID.
      • folder-id: Service account folder ID.
      • output: Authorized key file name.

      Result:

      id: aje8nn871qo4********
      service_account_id: ajehr0to1g8b********
      created_at: "2023-06-20T09:16:43.479156798Z"
      key_algorithm: RSA_2048
      
    2. Create a CLI profile to run operations under the service account:

      yc config profile create sa-terraform
      

      Result:

      Profile 'sa-terraform' created and activated
      
    3. Configure the profile:

      yc config set service-account-key key.json
      

      Where:

      service-account-key: Service account authorized key file.

    4. Add your credentials to the environment variables:

      export YC_TOKEN=$(yc iam create-token)
      

Set up an environment for deploying the resourcesSet up an environment for deploying the resources

  1. Install Terraform.

  2. Install Git using the following command:

    sudo apt install git
    
  3. Clone the yandex-cloud-examples/yc-route-switcher GitHub repository and navigate to the directory containing resources for our example:

    git clone https://github.com/yandex-cloud-examples/yc-route-switcher.git
    cd yc-route-switcher/examples
    
  4. Open the terraform.tfvars file in a text editor, such as nano:

    nano terraform.tfvars
    
  5. Edit the following:

    1. Folder ID line:

      folder_id = "<folder_ID>"
      
    2. Line with a list of public IP addresses allowed to access test-vm:

      trusted_ip_for_mgmt = ["<workstation_external_IP_address>/32"]
      

      Where:
      <workstation_external_IP_address> is your computer public IP address.

      To get your computer public IP address, run this command:

      curl 2ip.ru
      

      Result:

      192.240.24.87
      

Deploy your resourcesDeploy your resources

  1. Initialize Terraform:

    terraform init
    
  2. Check whether the Terraform configuration files are correct:

    terraform validate
    
  3. Check the list of new cloud resources:

    terraform plan
    
  4. Create the resources:

    terraform apply 
    
  5. Wait until the command completes and save its output:

    Outputs:
    nat-a_public_ip_address = "***.***.129.139"
    nat-b_public_ip_address = "***.***.105.234"
    path_for_private_ssh_key = "./pt_key.pem"
    test_vm_password = <sensitive>
    vm_username = "admin"
    

Enable the route switcherEnable the route switcher

  1. Make sure the NAT instances are running and accessible from the internal network:

    Management console
    1. In the management console, select your infrastructure folder.
    2. Select Network Load Balancer and navigate to the route-switcher-lb-... page.
    3. Expand the target group and check whether its resources are Healthy.
  2. Open the route-switcher.tf file in a text editor, such as nano:

    nano route-switcher.tf
    
  3. Change the start_module value in the route-switcher module to true.

  4. Start the module with this command:

    terraform apply 
    

    Within five minutes, the route switcher will start, providing fault tolerance for outbound NAT traffic.

Test the solution for performance and fault toleranceTest the solution for performance and fault tolerance

Testing the system performanceTesting the system performance

  1. Connect to the test-vm serial console:

    Management console
    1. In the management console, select your infrastructure folder.
    2. Select Compute Cloud.
    3. In the VM list, select test-vm.
    4. Navigate to the Serial console tab.
    5. Wait for the operating system to boot.
  2. Enter the admin username and password.
    To get the password, run this command from the Terraform directory on your computer:

    terraform output test_vm_password
    
  3. Make sure test-vm uses the nat-a public IP address to access the internet by running this command in the serial console:

    curl ifconfig.co
    

    Compare the IP address you get with nat-a_public_ip_address you saved earlier.

  4. Run a ping to a public host to trigger test VM outbound traffic:

    ping ya.ru
    

    Make sure you get an ICMP response:

    PING ya.ru (77.88.55.242) 56(84) bytes of data.
    64 bytes from ya.ru (77.88.55.242): icmp_seq=1 ttl=56 time=4.67 ms
    64 bytes from ya.ru (77.88.55.242): icmp_seq=2 ttl=56 time=3.83 ms
    64 bytes from ya.ru (77.88.55.242): icmp_seq=3 ttl=56 time=3.80 ms
    64 bytes from ya.ru (77.88.55.242): icmp_seq=4 ttl=56 time=3.78 ms
    
  5. Check the route table to make sure the Next hop value for the demo network matches the nat-a internal IP address.

Testing the system fault toleranceTesting the system fault tolerance

  1. Emulate a system failure by stopping the main NAT instance:

    Management console
    CLI
    API
    1. In the management console, select your infrastructure folder.
    2. Select Compute Cloud.
    3. Select the nat-a VM from the list, click , and select Stop.
    4. In the window that opens, click Stop.
    1. See the description of the CLI command for stopping a VM:

      yc compute instance stop --help
      
    2. Stop the VM:

      yc compute instance stop nat-a
      

    Use the stop REST API method for the Instance resource or the InstanceService/Stop gRPC API call.

  2. Monitor the loss of ping packets.
    After the main NAT instance gets disabled, you may see a traffic loss for about a minute with the subsequent traffic recovery.

  3. Make sure test-vm now uses the nat-b public IP address to access the internet by stopping ping and running this command in the serial console:

    curl ifconfig.co
    

    Compare the IP address you get with nat-b_public_ip_address you saved earlier.

  4. Check the route table to make sure the route switcher changed the Next hop value for the demo network to the nat-b internal IP address.

  5. Run a ping to trigger test VM outbound traffic:

  6. Emulate the system recovery by starting the main NAT instance:

    Management console
    CLI
    API
    1. In the management console, select your infrastructure folder.
    2. Select Compute Cloud.
    3. Select the nat-a VM from the list, click , and select Stop.
    4. In the window that opens, click Start.
    1. See the description of the CLI command for starting a VM:

      yc compute instance start --help
      
    2. Start the VM:

      yc compute instance start nat-a
      

    Use the start REST API method for the Instance resource or the InstanceService/Start gRPC API call.

  7. Monitor the ping output. As NAT-A recovers, you may not see any packet loss.

  8. Make sure test-vm now uses the nat-a public IP address to access the internet by stopping ping and running this command in the serial console:

    curl ifconfig.co
    

    Compare the IP address you get with nat-a_public_ip_address you saved earlier.

  9. Check the route table to make sure the route switcher changed the Next hop value for the demo network back to the nat-a internal IP address.

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created, run this command:

terraform destroy

Warning

Terraform will permanently delete all resources, such as networks, subnets, VMs, load balancer, etc.

Was the article helpful?

Previous
Creating a bastion host
Next
Creating a tunnel between two subnets using OpenVPN Access Server
© 2025 Direct Cursus Technology L.L.C.