Keepalived VRRP Failover in Production: Multi-Node Setup

Keepalived VRRP Failover in Production: Multi-Node Setup

Keepalived is the standard tool for implementing virtual IP failover on Linux. In theory, it is straightforward: the master holds the VIP, it fails, the backup takes over. In practice, production keepalived VRRP failover has a dozen failure modes that only surface under real conditions. This post covers a complete multi-node setup, a rigorous testing methodology, and lessons learned from actual production incidents.

Multi-Node Configuration: Beyond Two Nodes

Most tutorials cover two-node VRRP. Real production setups often have three or more nodes, which changes the priority arithmetic. With three nodes (priorities 100, 90, 80), you need to ensure that: a single node failure causes correct promotion, a two-node failure promotes the remaining node, and network partition scenarios do not produce split-brain.

# Node A (master) vrrp_instance VI_1 {     state MASTER     interface eth0     virtual_router_id 51     priority 100     advert_int 1     authentication {         auth_type PASS         auth_pass yourpassword     }     virtual_ipaddress {         192.168.1.100/24     }     track_script { chk_service }     notify /etc/keepalived/notify.sh }

Authentication and Security

VRRP advertisements are multicast UDP packets (224.0.0.18). Without authentication, any host on the segment can send a higher-priority advertisement and steal the VIP. Use auth_type PASS with a shared password at minimum. For higher security, use auth_type AH (HMAC-MD5), though this requires all nodes to have identical authentication config. In cloud environments where multicast is unavailable, use unicast_peer to list peer IPs explicitly.

Testing Failover Before Production

The only way to trust keepalived failover is to test it destructively. Test cases to run: kill keepalived on the master (systemctl stop keepalived), pull the network cable (ip link set eth0 down), kill the tracked service without stopping keepalived, reboot the master, and simulate network partition using iptables to drop VRRP multicast packets.

For each test: measure the time from failure to VIP transfer (use arping or a continuous ping to the VIP), verify the backup’s ARP table is updated, and confirm the notify script fires correctly. Target failover time should be advert_int * (dead_int factor) — typically 3 seconds with defaults.

Split-Brain: The Most Dangerous Failure Mode

Split-brain occurs when two nodes both believe they are master and both hold the VIP. In VRRP, this happens when VRRP advertisements between nodes are lost (network partition) but both nodes can still serve traffic. Both send GARP (Gratuitous ARP) for the VIP, and the ARP table on the network switch oscillates between them.

Prevention: use vrrp_script to check network reachability to an external gateway, not just local service health. If a node cannot reach the gateway, it should reduce its priority below all other nodes. Use a negative weight large enough to cause failover (see Blog 05 arithmetic).

The notify Script: Your Operational Lever

The notify script is called on every state transition (MASTER, BACKUP, FAULT) with arguments: instance_name, state, priority. Use it to: update DNS records via nsupdate or a cloud API, send alerts, configure local firewall rules that should only apply to the master, and write state to a file that your monitoring checks.

A critical gotcha: the notify script runs as root with keepalived’s environment, which may differ from your normal shell environment. Always use full paths. Test the script independently with the same arguments keepalived would pass.

Real Failure Lesson: The ARP Cache Staleness Problem

In one production incident, VRRP failover completed correctly — the backup’s keepalived logs showed MASTER state — but traffic continued going to the old master for 20-30 seconds. Root cause: the upstream router’s ARP cache had a 30-second TTL for the VIP’s MAC address. The backup had sent a GARP, but the router’s ARP implementation did not update on unsolicited ARPs.

Fix: send multiple GARPs on takeover. keepalived’s vrrp_garp_master_repeat and vrrp_garp_master_delay settings control this. Set vrrp_garp_master_repeat 5 and vrrp_garp_master_delay 1 to send 5 GARPs at 1-second intervals after taking master state.

Monitoring Keepalived in Production

Monitor: keepalived process is running (systemd or process check), current VRRP state (parse keepalived’s dbus interface or use SNMP), VIP is reachable from an external probe, script execution results (check /var/log/syslog for keepalived log lines), and state transition frequency (frequent transitions indicate instability).

Conclusion

Keepalived VRRP failover in production is reliable when configured and tested correctly. The most important practices: use vrrp_script for health-based failover with correctly calculated weight thresholds (Blog 05), test all failure modes destructively before going live, configure GARP repeat for environments with sticky ARP caches, and monitor state transitions continuously in production.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *