This might be hard to do with a load-balancer setup alone, because the balancer would need to know when to fail over (and not just in the case of Zope stopping to respond); another similar solution might be more along the lines of IP takeover software like Heartbeat (linuxha.org) or Failsafe (oss.sgi.com) that makes the "safe" hot-backup Zope box takeover the identity of the "corrupted" primary Zope box, assuming that you have a monitoring setup on the backup to audit the integrity of your primary Zope service/data. The advantage that this has over a load-balancer approach is that you can write your own custom monitors/audit-scripts to determine/test the criteria for a failover to a much more fine-grained degree than just refused HTTP connections. Linux Failsafe has service monitoring setups, while you would use Mon or something similar in conjunction with Heartbeat. I would suggest a combination of Squid in front and heartbeat or Linux Failsafe on Zope boxes (either independent ODB or with ZEO) nodes in this case. You could use ZEO+ExternalMount or ZSyncer to copy content to your second Zope if you had 2 antonymous Zopes. Squid (or any other reverse-proxy or load-balancer) would then obey any IP address takeover from a hot-backup node happening via gratuitous/unsolicited ARP. Sean -----Original Message----- From: Anthony Baxter [mailto:anthony@interlink.com.au] Sent: Monday, January 21, 2002 4:18 PM To: Terry Hancock Cc: zope@zope.org Subject: Re: [Zope] Static fail-over
Terry Hancock wrote I have a Zope site which I'm doing a lot of development on, and all internal safeguards aside, I feel there's a significant chance of wrecking Zope in the process. Then it might take anywhere from hours to days to get it back up again.
Aside from making sure you have a sane development and deployment process, you could consider using a load balancer, set up to point to the primary (development?) server normally, and switching it to a backup one when things fail. There's a bunch of hardware or software ones - I've used both. Hardware's better, obviously, but you could consider something like the tool 'balance' from balance.sf.net Anthony _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
-> This might be hard to do with a load-balancer setup alone, because -> the balancer would need to know when to fail over (and not just in -> the case of Zope stopping to respond); Please, let's get our terms straight: a "load-balancer" would direct web traffic to nodes according to some heuristic. A fail over system is where the service "fails over" to the backup when a problem is detected. It does not balance anything, least of all load. It's worth noting that every "load-balanced" cluster [I've ever seen] had some kind of heartbeat test that, when failed, would remove a node from the cluster (so it doesn't get any more traffic). If you only have two nodes, this would be similar to a fail over system, but *not* the same: a particular node would be "failed out", but no "fail over" would occur, because there would be no backup system taking "over" the responsibilities of a primary system. Instead, you'd just have a failed node in your cluster. -> lines of IP takeover software like Heartbeat (linuxha.org) or Failsafe -> (oss.sgi.com) that makes the "safe" hot-backup Zope box takeover the -> identity of the "corrupted" primary Zope box, assuming that you have a -> monitoring setup on the backup to audit the integrity of your primary Zope -> service/data. I'm highly interested in any real-world, in-production load balanced or fail over systems for Zope (esp. using Open Source software). -> I would suggest a combination of Squid in front and heartbeat or Linux -> Failsafe on Zope boxes (either independent ODB or with ZEO) nodes in this -> case. Is there a way to use URL Rewriting rules in Apache (with mod_rewrite) to test if a particular box was alive, and only if so, direct traffic there? Maybe have it look if a particular file exists (or some such)? (Also note that Apache will do much of what Squid will do using mod_proxy.) -> You could use ZEO+ExternalMount or ZSyncer to copy content to your second -> Zope if you had 2 antonymous Zopes. Squid (or any other reverse-proxy or -> load-balancer) would then obey any IP address takeover from a hot-backup -> node happening via gratuitous/unsolicited ARP. I.P. address take-overs are dangerous. What if the Zope processes die, but the O.S. is okay? You'll have an I.P. address conflict unless you can run a script on the primary box that tells it to shut down it's network interface. So what if the hardware locks up/loses all resources/gets into a loop of somekind? The NIC will still respond to its I.P. address, but you can't run the script to disable it. Bad situation--pray you have a watchdog card for those Zope processes. MAC address takeovers are somewhat dangerous, because the switch that you are connected into (such as at a Data Center) may not recognize the MAC address takeover if the NIC on the primary box is still responding (as above). I prefer solutions that keep all nodes (primary, backup, or any peer nodes) behind a NAT. Each node gets its own 192.168.0.x I.P. address, and the NAT box does all failover. You've now moved the I.P. takeover problem to the NAT box (with its backup), but since NAT is in the kernel (under Linux, at least) you'd be hard-pressed to find a NAT box that could respond to an ICMP or serial-port ping but not do NAT. If the kernel is running, it's running, and if it's not, it's not. --Derek
A fail over system is where the service "fails over" to the backup when a problem is detected. It does not balance anything, least of all load.
I think the point Sean was trying to make was that you may need to arrange for failover for the load balancer itself. At least you would if you think it's important to provide redundancy in whatever load balancing setup you implement.
I'm highly interested in any real-world, in-production load balanced or fail over systems for Zope (esp. using Open Source software).
I have moderate experience with LVS + Mon in combination with Squid and Zope. A single Squid handles HTTP requests from clients on the Internet. It talks to a TCP port on the load balancer. LVS as the load balancer provides the balancing service between a number of Zopes. Mon provides the capability to remove failed Zopes from the load balancing rotation via polling often against a method that returns a known response. An extension of this configuration which I've not implemented yet (but will need to very soon) will be to put a separate load balancer in front of a number of ICP-connected Squids, each configured with the virtual IP on the LVS box as its http_accel port. I need to do this to be able to scale cache services and provide redundancy in cache services instead of having a single Squid frontending the whole shooting match.
Is there a way to use URL Rewriting rules in Apache (with mod_rewrite) to test if a particular box was alive, and only if so, direct traffic there? Maybe have it look if a particular file exists (or some such)?
If you find out, please let me know, that sounds very useful!
(Also note that Apache will do much of what Squid will do using mod_proxy.)
mod_proxy isn't very well documented and doesn't do ICP (which is pretty handy for scaling the cache). But it's fine for small setups. I imagine you could even share its cache directory over NFS if you wanted some failover capability without losing all that was cached to disk. - C
A fail over system is where the service "fails over" to the backup when a problem is detected. It does not balance anything, least of all load.
I think the point Sean was trying to make was that you may need to arrange for failover for the load balancer itself. At least you would if you think it's important to provide redundancy in whatever load balancing setup you implement.
I'm highly interested in any real-world, in-production load balanced or fail over systems for Zope (esp. using Open Source software).
I have moderate experience with LVS + Mon in combination with Squid and Zope. A single Squid handles HTTP requests from clients on the Internet. It talks to a TCP port on the load balancer. LVS as the load balancer provides the balancing service between a number of Zopes. Mon provides the capability to remove failed Zopes from the load balancing rotation via polling often against a method that returns a known response. An extension of this configuration which I've not implemented yet (but will need to very soon) will be to put a separate load balancer in front of a number of ICP-connected Squids, each configured with the virtual IP on the LVS box as its http_accel port. I need to do this to be able to scale cache services and provide redundancy in cache services instead of having a single Squid frontending the whole shooting match.
Is there a way to use URL Rewriting rules in Apache (with mod_rewrite) to test if a particular box was alive, and only if so, direct traffic there? Maybe have it look if a particular file exists (or some such)?
If you find out, please let me know, that sounds very useful!
(Also note that Apache will do much of what Squid will do using mod_proxy.)
mod_proxy isn't very well documented and doesn't do ICP (which is pretty handy for scaling the cache). But it's fine for small setups. I imagine you could even share its cache directory over NFS if you wanted some failover capability without losing all that was cached to disk. - C
Chris McDonough wrote:
Is there a way to use URL Rewriting rules in Apache (with mod_rewrite) to test if a particular box was alive, and only if so, direct traffic there? Maybe have it look if a particular file exists (or some such)?
If you find out, please let me know, that sounds very useful!
Looking at the http://httpd.apache.org/docs/misc/rewriteguide.html , it seems you can use perl scripts to control mod_rewrite behavior. This should give interesting opportunities for this kind of file checking stuff. """ A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution by the use of mod_rewrite... Solution: Use an external rewrite map, i.e. a program which acts like a rewrite map. It is run once on startup of Apache receives the requested URLs on STDIN and has to put the resulting (usually rewritten) URL on STDOUT (same order!). """ Philippe
Chris McDonough wrote:
[...]
Is there a way to use URL Rewriting rules in Apache (with mod_rewrite) to test if a particular box was alive, and only if so, direct traffic there? Maybe have it look if a particular file exists (or some such)?
If you find out, please let me know, that sounds very useful!
Perhaps a bit trivial, but maybe this could be a poor man's solution to this problem: Apache gives the possibility to customize its error behaviour, http://httpd.apache.org/docs/mod/core.html#errordocument http://httpd.apache.org/docs/custom-error.html Maybe it's possible to just use that to redirect apache to an uri which delivers a nightly built static version of the site. Also, one can redirect to an cgi script which sends an email before redirecting to that static version. I just don't know for sure whether the "upstream host not responding" error of mod_proxy can be handled here. cheers, oliver
participants (5)
-
Chris McDonough -
Derek Simkowiak -
Oliver Bleutgen -
Philippe Jadin -
sean.upton@uniontrib.com