pfSense High Availability cluster
pfSense HA deploys two pfSense 2.7.2 firewalls as an
active-passive pair in different availability zones, with automatic failover and
native configuration synchronization. It is purpose-built for Yandex Cloud,
where CARP and pfsync are not available, and uses the cloud control plane to
move traffic to the standby node when the active node fails.
At boot the primary node carries all traffic. A failover daemon runs on both
nodes and derives its active/passive role from the VPC route table — the single
source of truth — re-evaluating every ~5 seconds. The passive node probes the
active over two independent paths each cycle: the active’s private WAN address
inside the VPC and its public WAN address over the internet. A cycle counts as a
failure only when both paths fail. On sustained failure it rewrites the route
table default route to itself; the route only moves when the active node has
actually failed.
Failover is automatic and bidirectional: whichever node is active stays
active until it fails, then the other takes over, in either direction, so the
cluster survives sequential failures and is not left as a single point of failure
after the first one. There is no failback to a preferred node, which — together
with a post-switch cooldown — prevents flapping (and because there is no pfsync,
every switch resets active TCP sessions, so avoiding needless switches matters).
The detection is built to fail over for the right reasons and only the right
reasons:
- Identity-pinned probe. The probe targets a daemon health endpoint
(TCP 8443) served with each node’s own certificate and verifies the peer’s
certificate fingerprint (SHA-256). A stranger that lands on a reassigned
ephemeral public IP cannot impersonate the peer and so cannot mask a real
failure. - Data-plane-aware readiness. The health endpoint reports healthy only when
the node’s own WAN gateway is reachable, so failover triggers on “the active
can no longer forward upstream” rather than on a powered-but-useless box. - Split-brain protection. A node-to-node VPC partition kills the private path
but not the public one, so a partitioned-but-alive active stays observable and
is not taken over. - Asymmetric, anti-flap thresholds. A node that is gone fails over fast
(~15-20 seconds end to end); a node that is reachable but degraded fails over
more slowly to dampen transient signals.
Configuration is kept in sync from primary to secondary using pfSense’s built-in
XMLRPC mechanism over HTTPS, so firewall rules and settings made on the primary
propagate to the standby automatically.
Key features
- Active-passive pfSense 2.7.2 pair across two availability zones.
- Automatic, bidirectional failover via VPC route table next-hop update --
survives sequential failures of either node. - Native config synchronization via XMLRPC over HTTPS.
- Symmetric failover daemon on both nodes; the passive node probes the active
over two independent paths (private VPC + public) and fails over only when
both fail, which keeps a network partition from triggering a false failover
(split-brain protection). - Data-plane-aware detection: the probed health endpoint (TCP 8443) reports
healthy only when the node can reach its own WAN gateway, so failover follows
loss of upstream forwarding, not loss of power. - Certificate-pinned probe (SHA-256 fingerprint): a reassigned ephemeral public
IP cannot impersonate the peer and mask a real failure. - Asymmetric thresholds: a gone node fails over fast (~15-20 seconds); a
degraded-but-reachable node fails over more slowly to dampen transients. - No SDK or service-account key files: the failover daemon uses IAM tokens from
the instance metadata service. - Least-privilege service account (three IAM roles only).
Before you deploy
The product creates and owns only the firewall pair (two VMs, a service account,
a security group, and the shared Lockbox config secret). It deliberately does
not create the resources below: they are long-lived, shared, or
security-sensitive, you control their naming and lifecycle, and the route table
must already exist for the firewall to integrate with your network. Create them
first and pass their IDs to the form:
- Lockbox secret with the
adminpassword — both nodes authenticate to each
other over pfSense XMLRPC using theadminpassword, and the secondary must
read it at boot. It lives in Lockbox so the password never appears in instance
metadata or logs. You create it because you choose the password. See
“Creating the admin password secret” below. - SSH public key for the
freebsduser — break-glass access for debugging;
the form injects your key into both VMs. SSH setup runs early in bootstrap so
you can reach a node even if later steps fail. - VPC route table — this is the failover mechanism. On takeover the daemon
rewrites the table’s0.0.0.0/0next-hop to the surviving node’s WAN IP (your
static routes are preserved). It must already exist and be attached to the LAN
subnets whose traffic the firewalls route; the product only updates it. - Two WAN subnets in different availability zones, NAT enabled — the two
nodes run in different zones for zone-level fault tolerance, so each needs its
own WAN subnet. NAT provides outbound access and reachability to the Yandex
Cloud control plane (Lockbox, VPC APIs) that the daemon calls.
Both subnets must belong to the same VPC network — the one shared security
group, the route table whose next-hop is switched on failover, and peer
discovery all operate within a single network. Pick the same network as your
route_table_id. Choosing subnets from two different networks fails at deploy
with “Security group and subnet have different networks”.
Creating the admin password secret
The Lockbox secret must contain a single text entry whose key is exactly
password and whose value is the plaintext admin password. Create it with the
Yandex Cloud CLI:
yc lockbox secret create \
--name pfsense-admin-password \
--payload '[{"key":"password","text_value":"<your-admin-password>"}]'
In the console: Lockbox -> Create secret -> add a key/value entry with key
password and the password as the value. Copy the resulting secret ID into the
admin_password_secret_id parameter. The key name must be password — the
bootstrap reads that exact key and fails if it is missing.
Deployment parameters
These are the fields shown in the deployment form:
- Name — base name prefix for all created resources.
- WAN subnet (zone ru-central1-a) — VPC subnet used as the first node’s WAN
network for management and internet access via NAT. - WAN subnet (zone ru-central1-b) — VPC subnet used as the second node’s WAN
network (different availability zone). - Route table — VPC route table whose
0.0.0.0/0next-hop the failover daemon
switches between the nodes. - Admin password secret — Lockbox secret holding the pfSense
adminpassword. - SSH public key (FreeBSD user) — public SSH key injected into both VMs for
break-glass access. - Environment — environment type for the deployment (e.g. Development /
Production).
Ingress is open to 0.0.0.0/0 on TCP 22 (SSH, key-only — password
authentication is disabled), TCP 80 and 443 (WebUI; 80 redirects to HTTPS;
443 also carries XMLRPC config sync), TCP 8443 (the HA daemon health endpoint
the peer probes over the private and public paths — it exposes only a 200/503
health boolean over a certificate-pinned TLS endpoint, no sensitive data), and
ICMP. Outbound is open. Restrict access at the security-group or subnet level if
required.
Routing LAN workloads through the firewall
The firewall pair routes and NATs traffic for your LAN subnets. The deployment
itself does NOT attach the route table to any subnet — you must do this so
workload traffic actually flows through pfSense.
Required steps:
- Attach the route table to each LAN subnet. The
route_table_idyou passed
at deploy carries the default route (0.0.0.0/0) that the failover daemon
switches between the primary and secondary. A LAN subnet only routes through
the firewall once this table is associated with it:Without this, LAN VMs bypass the firewall entirely (or have no egress).yc vpc subnet update <LAN_SUBNET_ID> --route-table-id <route_table_id> - Use RFC1918 ranges for LAN subnets (
10.0.0.0/8,172.16.0.0/12,
192.168.0.0/16). The firewall’s outbound NAT and security group are
pre-configured for these ranges; other ranges are neither NAT’d nor allowed. - Do not assign public IPs to LAN VMs. Their only egress path must be the
route table to pfSense; a public IP on a LAN VM causes asymmetric routing
(inbound via its own NAT, outbound via pfSense) and breaks connections. - LAN subnets must be in the same VPC network as the firewall WAN subnets.
What you get:
- LAN VMs reach the internet through pfSense, NAT’d with the active node’s WAN
public IP. On failover the route table moves to the surviving node and new
connections follow it automatically. - pfSense filters and NATs the traffic — it is a real firewall in the path.
For inbound services published to a LAN VM, add a port-forward / 1:1 NAT rule on
the firewall’s public WAN IP; the route table alone covers egress and transit
only.
For production, assign static (reserved) public IPs to the firewall VMs:
ephemeral public IPs change on stop/start, which changes the egress IP seen by
LAN workloads.
- Creating VPN connections between physical and cloud resources.
- Protecting sites and applications.
- Translating addresses.
- Filtering traffic.
- Routing on the internet.
- Detecting intrusions (IDS/IPS).
- Traffic monitoring.
- Dynamic routing.
OpenNix provides support to pfSense users in Yandex Cloud. You can contact their support team by email at support@opennix.ru. Support is available on business days from 9 a.m. to 6 p.m., GMT+3.
| Resource type | Quantity |
|---|---|
| VPC security group | 1 |
| Access rights for folder | 3 |
| Lockbox secret | 1 |
| Service account | 1 |
| Virtual machines | 2 |