-> This might be hard to do with a load-balancer setup alone, because -> the balancer would need to know when to fail over (and not just in -> the case of Zope stopping to respond); Please, let's get our terms straight: a "load-balancer" would direct web traffic to nodes according to some heuristic. A fail over system is where the service "fails over" to the backup when a problem is detected. It does not balance anything, least of all load. It's worth noting that every "load-balanced" cluster [I've ever seen] had some kind of heartbeat test that, when failed, would remove a node from the cluster (so it doesn't get any more traffic). If you only have two nodes, this would be similar to a fail over system, but *not* the same: a particular node would be "failed out", but no "fail over" would occur, because there would be no backup system taking "over" the responsibilities of a primary system. Instead, you'd just have a failed node in your cluster. -> lines of IP takeover software like Heartbeat (linuxha.org) or Failsafe -> (oss.sgi.com) that makes the "safe" hot-backup Zope box takeover the -> identity of the "corrupted" primary Zope box, assuming that you have a -> monitoring setup on the backup to audit the integrity of your primary Zope -> service/data. I'm highly interested in any real-world, in-production load balanced or fail over systems for Zope (esp. using Open Source software). -> I would suggest a combination of Squid in front and heartbeat or Linux -> Failsafe on Zope boxes (either independent ODB or with ZEO) nodes in this -> case. Is there a way to use URL Rewriting rules in Apache (with mod_rewrite) to test if a particular box was alive, and only if so, direct traffic there? Maybe have it look if a particular file exists (or some such)? (Also note that Apache will do much of what Squid will do using mod_proxy.) -> You could use ZEO+ExternalMount or ZSyncer to copy content to your second -> Zope if you had 2 antonymous Zopes. Squid (or any other reverse-proxy or -> load-balancer) would then obey any IP address takeover from a hot-backup -> node happening via gratuitous/unsolicited ARP. I.P. address take-overs are dangerous. What if the Zope processes die, but the O.S. is okay? You'll have an I.P. address conflict unless you can run a script on the primary box that tells it to shut down it's network interface. So what if the hardware locks up/loses all resources/gets into a loop of somekind? The NIC will still respond to its I.P. address, but you can't run the script to disable it. Bad situation--pray you have a watchdog card for those Zope processes. MAC address takeovers are somewhat dangerous, because the switch that you are connected into (such as at a Data Center) may not recognize the MAC address takeover if the NIC on the primary box is still responding (as above). I prefer solutions that keep all nodes (primary, backup, or any peer nodes) behind a NAT. Each node gets its own 192.168.0.x I.P. address, and the NAT box does all failover. You've now moved the I.P. takeover problem to the NAT box (with its backup), but since NAT is in the kernel (under Linux, at least) you'd be hard-pressed to find a NAT box that could respond to an ICMP or serial-port ping but not do NAT. If the kernel is running, it's running, and if it's not, it's not. --Derek