[Zope] Static fail-over

Thu, 24 Jan 2002 14:49:11 -0800

In a load-distributed system (i.e. round robin, or anything that is
load-balancing without detection of an out-of-service web node), you need to
manage availability yourself, and this sucks for clusters of web nodes
larger than 2, because you have to set them up in a ring, using IP takeover.
The problem with this, say in a 3 node setup, is that if n1 fails, and n2
and n3 are still alive, one or the other of n2 or n3 will need to take on
n1's load - they can't split it. 

In a true load-balanced arrangement, the forward LB/Proxy will detect that a
box is out, and remove it from the pool.  This is ideal for a lot of nodes.
It sounds to me like Squid+Zope+ICP sounds like the best way to do this, of
course, with the use of monitoring software like mon as well.  

In the case of a 2-node cluster, IP takeover is easier and cheaper; but this
doesn't scale well for HTTP traffic, given the alternative of load-balancing
is better.

The new setup I plan on using for future projects is a pair of load-balanced
proxy servers balanced with a L4 switch with direct packet return, a feature
of my L4 switch which has a speed advantage over a proxied approach; the
downside to this is that I have to manage availability of my proxy server
boxes themselves; I do this with IP-takeover based software like heartbeat,
which fills this niche very well.  I also use IP takeover for my relational
database, file serving, and ZEO storage server setups.  The only tier of
this setup that will not be managed by IP takeover clustering will be the
web servers themselves, which will be managed by the 2 proxy servers acting
in a capacity of a load-balancer (Squid+Zope+ICP patches).

             | ^
             v |
         >[Router]<
        /    |     \
       /     v      \
      / [L4Switch ]  \       Hardware Load-Balancer
     /   |        |   \
DMZ  |   v        v   |
====[Cache1]:::::[Cache2]  -->Squid Proxies
Private  |        |        For Caching, LoadBalancing,
Netwk    |        |        security, and availability
         |        |        of web servers
        _|________|_        -Clustered w/Heartbeat
       /   |   |    \       
    [n1] [n2] [n3] [n*]    -->Web Servers - Availability
      |    |    |    |          managed via Squid/ICP
      +----+----+----+
           |               -->Storage Svrs.(ZSS/RDB/File)
           |               Primary and Hot Backup
           |                  Clustering w/Heartbeat
      [Storage1]:::::::::[Storage2]

I hope this makes sense and/or is helpful in some way... I think it is a
good generic blueprint for a relatively scalable setup for a large site.

Sean

-----Original Message-----
From: Derek Simkowiak [mailto:dereks@realloc.net]
Sent: Thursday, January 24, 2002 2:00 PM
To: sean.upton@uniontrib.com
Cc: zope@zope.org
Subject: RE: [Zope] Static fail-over

-> I guess what I'm saying is there are places where IP-takeover based
-> clustering is appropriate, sometimes even in conjunction with forward
-> traffic direction.

	Sean: Thanks for the I.P. Takeover info!

	While we're on the topic, I have a quick 'opinion' question about
clusters.  This applies directly to a Zope cluster I'll be building soon.

	A fully redundant, yet not load-balanced, H.A. system requires
*almost* all the same hardware and software as a 2-node load-balanced
system.  That is, you need to detect a service failure and, if found, make
sure traffic goes to the backup system, not the primary system.  (You'd
also want to send an alert, etc., and if you only have dual redundancy,
you'll also want to monitor the backup to make sure it'll be there when
the failover is needed.)

	In a load balanced system (with homogenous nodes), you need to
watch all nodes for failure and, if found, fail out that particular node.  
But instead of the backup hardware going "wasted", just waiting for a
failover, you've halved the hardware workload by distributing the work to
both machines.  This may result in a faster response to endusers.

	My question:  Does it ever make sense to set up a redundant system
without load balancing?  After all, plopping in new nodes on an as-needed
basis is a very handy feature.

	The only thing I can think of is this:  Imagine a site that must
serve 1 zillion requests per day (a zillion being a Very Big Number).  If
you use a simple failover system, then you buy two boxes, both capable of
handling 1 zillion requests.  If the site grows more popular so you must
handle 2 zillion requests/day, you just upgrade both servers to handle 2
zillion requests/day.  (This is a thought experiment, ignore the fact that
you should have planned for the growth in the first place :)

	Now imagine those same two (1 zillion/day capable) boxes have been
configured for load balancing.  Immediately, each server is only serving
.5 zillion requests/day.  As the site grows to it's new 2 zillion/day
load, both servers being in use means no hardware upgrade is needed.  BUT
--and this is a big but-- you no longer have an H.A. system.  You've lost
your redundancy.  If one of the servers go down, about half of your
customers will get an HTTP 502 "Overloaded" error message.  So to keep
full redundancy, you actually need THREE nodes.  In fact, for however many
nodes you want your cluster to be, you need to add one extra "redundant"  
node that would handle the traffic for any failed node (just until that
failed node is repaired).

	So I guess I answered my own question: in a two-node load balanced 
system, the second node would really be nothing more than a backup node 
(even though it's handling traffic), and thus you'd need to upgrade your 
hardware (or rather, just add more nodes) as soon as your traffic exceeded 
the limit of 

(nodecount - 1) * traffic_per_node

	And the "traffic_per_node" you'd have to assume would be peak
usage traffic, i.e. 

(  total_peak_traffic / (nodecount - 1)  )

	In the real world of public websites, however, I think a
load-balanced system may actually offer extra redundancy.  Because no site
will get its peak load on a 24/7 basis, the load balanced system can use
any extra resources (which are free because the cluster is not at 100%
capacity) to fill in for the failed node -- and this would be *in addition
to* your extra failover node.  In a simple failover-system, you don't get
this.

	Any additional comments would be greatly appreciated.

--Derek