OPNsense HA Firewall
OPNsense HA deploys two OPNsense 26.1.9 firewalls as an active-passive pair in different availability zones, with automatic failover and native configuration synchronization. It is purpose-built for Yandex Cloud, where CARP and pfsync are not available, and uses the cloud control plane to move traffic to the standby node when the active node fails.
At boot the primary node carries all traffic. A failover daemon runs on both nodes and derives its active/passive role from the VPC route table — the single source of truth — re-evaluating every ~5 seconds. The passive node probes the active over two independent paths each cycle: the active’s private WAN address inside the VPC and its public WAN address over the internet. A cycle counts as a
failure only when both paths fail. On sustained failure it rewrites the route table default route to itself; the route only moves when the active node has actually failed.
Failover is automatic and bidirectional: whichever node is active stays active until it fails, then the other takes over, in either direction, so the cluster survives sequential failures and is not left as a single point of failure after the first one. There is no failback to a preferred node, which — together with a post-switch cooldown — prevents flapping (and because there is no pfsync, every switch resets active TCP sessions, so avoiding needless switches matters).
The detection is built to fail over for the right reasons and only the right reasons:
- Identity-pinned probe. The probe targets a daemon health endpoint
(TCP 8443) served with each node’s own certificate and verifies the peer’s certificate fingerprint (SHA-256). A stranger that lands on a reassigned ephemeral public IP cannot impersonate the peer and so cannot mask a real failure. - Data-plane-aware readiness. The health endpoint reports healthy only when the node’s own WAN gateway is reachable, so failover triggers on “the active can no longer forward upstream” rather than on a powered-but-useless box.
- Split-brain protection. A node-to-node VPC partition kills the private path but not the public one, so a partitioned-but-alive active stays observable and is not taken over.
- Asymmetric, anti-flap thresholds. A node that is gone fails over fast (~15-20 seconds end to end); a node that is reachable but degraded fails over more slowly (~25-35 seconds) to dampen transient signals.
Configuration is kept in sync from primary to secondary using OPNsense’s built-in XMLRPC mechanism over HTTPS (TCP 443), so firewall rules, NAT, VPN, and other settings made on the primary propagate to the standby automatically.
Key features
- Active-passive OPNsense 26.1.9 pair across two availability zones.
- Automatic, bidirectional failover via VPC route table next-hop update — survives sequential failures of either node.
- Native config synchronization via XMLRPC over HTTPS — configure rules, NAT, and VPN on the primary only; the secondary receives them automatically.
- Symmetric failover daemon on both nodes; the passive node probes the active over two independent paths (private VPC + public) and fails over only when both fail, which keeps a network partition from triggering a false failover (split-brain protection).
- Data-plane-aware detection: the probed health endpoint (TCP 8443) reports healthy only when the node can reach its own WAN gateway, so failover follows loss of upstream forwarding, not loss of power.
- Certificate-pinned probe (SHA-256 fingerprint): a reassigned ephemeral public IP cannot impersonate the peer and mask a real failure.
- Asymmetric thresholds: a gone node fails over fast (~15-20 seconds); a degraded-but-reachable node fails over more slowly (~25-35 seconds) to dampen transients.
- Full OPNsense feature set: VPN (OpenVPN, IPsec), IDS/IPS (Suricata), WireGuard, web filtering and proxy, and a large plugin ecosystem. The base image ships the OPNsense default package set; add-on plugins such as Suricata and WireGuard are installed on demand from the OPNsense catalog over the WAN connection.
- Hybrid outbound NAT with manual rules for RFC1918 sources; one-armed NAT for routed workload traffic on WAN.
- No SDK or service-account key files: the failover daemon uses IAM tokens from the instance metadata service.
- Least-privilege service account (three IAM roles only).
Before you deploy
The product creates and owns only the firewall pair (two VMs, a service account,
a security group, and the shared Lockbox config secret). It deliberately does not create the resources below: they are long-lived, shared, or security-sensitive, you control their naming and lifecycle, and the route table must already exist for the firewall to integrate with your network. Create them first and pass their IDs to the form:
- Lockbox secret with the
adminpassword — both nodes authenticate to each other over OPNsense XMLRPC, and the secondary must read the password at boot. It lives in Lockbox so the password never appears in instance metadata or logs. You create it because you choose the password. See “Creating the admin password secret” below. - SSH public key for the
freebsduser — break-glass access for debugging; the form injects your key into both VMs. SSH setup runs early in bootstrap so you can reach a node even if later steps fail. - VPC route table — this is the failover mechanism. On takeover the daemon rewrites the table’s
0.0.0.0/0next-hop to the surviving node’s WAN private IP (your static routes are preserved). It must already exist and be attached to the LAN subnets whose traffic the firewalls route; the product only updates it. - Two WAN subnets in different availability zones, NAT enabled — the two nodes run in different zones for zone-level fault tolerance, so each needs its own WAN subnet. NAT provides outbound access and reachability to the Yandex Cloud control plane (Lockbox, VPC APIs) that the daemon calls. Both subnets must belong to the same VPC network as the route table.
Creating the admin password secret
The Lockbox secret must contain a single text entry whose key is exactly password and whose value is the plaintext administrator password (applied to the OPNsense root account). Create it with the Yandex Cloud CLI:
yc lockbox secret create \
--name opnsense-admin-password \
--payload '[{"key":"password","text_value":"<your-admin-password>"}]'
In the console: Lockbox -> Create secret -> add a key/value entry with key password and the password as the value. Copy the resulting secret ID into the admin_password_secret_id parameter. The key name must be password — the bootstrap reads that exact key and fails if it is missing. The product does not enforce a minimum length, so choose a strong password.
Deployment parameters
These are the fields shown in the deployment form:
- Name — base name prefix for all created resources.
- WAN subnet (first availability zone) — VPC subnet used as the first node’s WAN network for management and internet access via NAT.
- WAN subnet (second availability zone) — VPC subnet used as the second node’s WAN network (different availability zone).
- Route table — VPC route table whose
0.0.0.0/0next-hop the failover daemon switches between the nodes. - Admin password secret — Lockbox secret holding the OPNsense administrator password (applied to the
rootaccount). - SSH public key (FreeBSD user) — public SSH key injected into both VMs for break-glass access.
- Environment — environment type for the deployment (
prod/dev)
Ingress is open to 0.0.0.0/0 on TCP 22 (SSH, key-only — password
authentication is disabled), TCP 443 (WebUI HTTPS; 443 also carries XMLRPC config sync), TCP 8443 (the HA daemon health endpoint the peer probes over the private and public paths — it exposes only a 200/503 health boolean over a certificate-pinned TLS endpoint, no sensitive data), and ICMP. In addition, all
protocols are allowed from the RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) so the firewall can route and NAT your LAN workloads. Restrict access at the security-group or subnet level if required.
Security note (port 8443): port 8443 is internet-reachable, which is required because the cross-internet split-brain probe reaches the peer node over its public IP. The endpoint exposes only a 200/503 health boolean over TLS and reveals nothing else. Operators should apply rate-limiting / SYN protection and restrict access to 8443 at the network edge (trusted CIDRs) where possible.
Accessing the WebUI
After deployment completes (~3-4 minutes):
- Open the Yandex Cloud console, navigate to Compute Cloud.
- Find the primary VM and copy its public IP address.
- Open
https://<primary-ip>in a browser, accept the self-signed certificate warning. - Log in as the
rootuser (the OPNsense administrator account) with the password from the Lockbox secret.
Make all configuration changes on the primary only. Configuration is automatically synchronized to the secondary via XMLRPC.
Routing LAN workloads through the firewall
The firewall pair routes and NATs traffic for your LAN subnets. The deployment itself does NOT attach the route table to any subnet — you must do this so workload traffic actually flows through OPNsense.
Required steps:
- Attach the route table to each LAN subnet. The
route_table_idyou passed at deploy carries the default route (0.0.0.0/0) that the failover daemon switches between the primary and secondary. A LAN subnet only routes through the firewall once this table is associated with it:Without this, LAN VMs bypass the firewall entirely (or have no egress). The default route is set automatically by the failover daemon after boot.yc vpc subnet update <LAN_SUBNET_ID> --route-table-id <route_table_id> - Use RFC1918 ranges for LAN subnets (
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16). The firewall’s outbound NAT and security group are pre-configured for these ranges; other ranges are neither NAT’d nor allowed. - Do not assign public IPs to LAN VMs. Their only egress path must be the route table to OPNsense; a public IP on a LAN VM causes asymmetric routing (inbound via its own NAT, outbound via OPNsense) and breaks connections.
- LAN subnets must be in the same VPC network as the firewall WAN subnets.
The HA route table MUST be attached to LAN/workload subnets and NEVER to a WAN subnet: a WAN-attached default route would point the node’s gateway at itself and blackhole the node’s own egress (SSH, Lockbox, YC API).
What you get:
- LAN VMs reach the internet through OPNsense, NAT’d with the active node’s WAN public IP. On failover the route table moves to the surviving node and new connections follow it automatically.
- OPNsense filters and NATs the traffic — it is a real firewall in the path, with VPN, IDS/IPS, and the full plugin set available.
For inbound services published to a LAN VM, add a port-forward / 1:1 NAT rule on the firewall’s public WAN IP; the route table alone covers egress and transit only.
For production, assign static (reserved) public IPs to the firewall VMs: ephemeral public IPs change on stop/start, which changes the egress IP seen by LAN workloads.
- Creating a VPN connection to provide remote access to resources or interconnect physical and cloud infrastructures.
- Protecting sites and applications.
- Translating addresses.
- Filtering traffic.
- Routing on the internet.
OpenNix provides technical support to OPNsense users in Yandex Cloud. You can contact their technical support by email at support@opennix.ru. Support engineers are available on business days from 9 am to 6 pm GMT+3.
| Resource type | Quantity |
|---|---|
| Access rights for folder | 3 |
| Virtual machines | 2 |
| Lockbox secret | 1 |
| Service account | 1 |
| VPC security group | 1 |