In a load-distributed system (i.e. round robin, or anything that is load-balancing without detection of an out-of-service web node), you need to manage availability yourself, and this sucks for clusters of web nodes larger than 2, because you have to set them up in a ring, using IP takeover. The problem with this, say in a 3 node setup, is that if n1 fails, and n2 and n3 are still alive, one or the other of n2 or n3 will need to take on n1's load - they can't split it. In a true load-balanced arrangement, the forward LB/Proxy will detect that a box is out, and remove it from the pool. This is ideal for a lot of nodes. It sounds to me like Squid+Zope+ICP sounds like the best way to do this, of course, with the use of monitoring software like mon as well. In the case of a 2-node cluster, IP takeover is easier and cheaper; but this doesn't scale well for HTTP traffic, given the alternative of load-balancing is better. The new setup I plan on using for future projects is a pair of load-balanced proxy servers balanced with a L4 switch with direct packet return, a feature of my L4 switch which has a speed advantage over a proxied approach; the downside to this is that I have to manage availability of my proxy server boxes themselves; I do this with IP-takeover based software like heartbeat, which fills this niche very well. I also use IP takeover for my relational database, file serving, and ZEO storage server setups. The only tier of this setup that will not be managed by IP takeover clustering will be the web servers themselves, which will be managed by the 2 proxy servers acting in a capacity of a load-balancer (Squid+Zope+ICP patches). | ^ v | >[Router]< / | \ / v \ / [L4Switch ] \ Hardware Load-Balancer / | | \ DMZ | v v | ====[Cache1]:::::[Cache2] -->Squid Proxies Private | | For Caching, LoadBalancing, Netwk | | security, and availability | | of web servers _|________|_ -Clustered w/Heartbeat / | | \ [n1] [n2] [n3] [n*] -->Web Servers - Availability | | | | managed via Squid/ICP +----+----+----+ | -->Storage Svrs.(ZSS/RDB/File) | Primary and Hot Backup | Clustering w/Heartbeat [Storage1]:::::::::[Storage2] I hope this makes sense and/or is helpful in some way... I think it is a good generic blueprint for a relatively scalable setup for a large site. Sean -----Original Message----- From: Derek Simkowiak [mailto:dereks@realloc.net] Sent: Thursday, January 24, 2002 2:00 PM To: sean.upton@uniontrib.com Cc: zope@zope.org Subject: RE: [Zope] Static fail-over -> I guess what I'm saying is there are places where IP-takeover based -> clustering is appropriate, sometimes even in conjunction with forward -> traffic direction. Sean: Thanks for the I.P. Takeover info! While we're on the topic, I have a quick 'opinion' question about clusters. This applies directly to a Zope cluster I'll be building soon. A fully redundant, yet not load-balanced, H.A. system requires *almost* all the same hardware and software as a 2-node load-balanced system. That is, you need to detect a service failure and, if found, make sure traffic goes to the backup system, not the primary system. (You'd also want to send an alert, etc., and if you only have dual redundancy, you'll also want to monitor the backup to make sure it'll be there when the failover is needed.) In a load balanced system (with homogenous nodes), you need to watch all nodes for failure and, if found, fail out that particular node. But instead of the backup hardware going "wasted", just waiting for a failover, you've halved the hardware workload by distributing the work to both machines. This may result in a faster response to endusers. My question: Does it ever make sense to set up a redundant system without load balancing? After all, plopping in new nodes on an as-needed basis is a very handy feature. The only thing I can think of is this: Imagine a site that must serve 1 zillion requests per day (a zillion being a Very Big Number). If you use a simple failover system, then you buy two boxes, both capable of handling 1 zillion requests. If the site grows more popular so you must handle 2 zillion requests/day, you just upgrade both servers to handle 2 zillion requests/day. (This is a thought experiment, ignore the fact that you should have planned for the growth in the first place :) Now imagine those same two (1 zillion/day capable) boxes have been configured for load balancing. Immediately, each server is only serving .5 zillion requests/day. As the site grows to it's new 2 zillion/day load, both servers being in use means no hardware upgrade is needed. BUT --and this is a big but-- you no longer have an H.A. system. You've lost your redundancy. If one of the servers go down, about half of your customers will get an HTTP 502 "Overloaded" error message. So to keep full redundancy, you actually need THREE nodes. In fact, for however many nodes you want your cluster to be, you need to add one extra "redundant" node that would handle the traffic for any failed node (just until that failed node is repaired). So I guess I answered my own question: in a two-node load balanced system, the second node would really be nothing more than a backup node (even though it's handling traffic), and thus you'd need to upgrade your hardware (or rather, just add more nodes) as soon as your traffic exceeded the limit of (nodecount - 1) * traffic_per_node And the "traffic_per_node" you'd have to assume would be peak usage traffic, i.e. ( total_peak_traffic / (nodecount - 1) ) In the real world of public websites, however, I think a load-balanced system may actually offer extra redundancy. Because no site will get its peak load on a 24/7 basis, the load balanced system can use any extra resources (which are free because the cluster is not at 100% capacity) to fill in for the failed node -- and this would be *in addition to* your extra failover node. In a simple failover-system, you don't get this. Any additional comments would be greatly appreciated. --Derek
participants (1)
-
sean.upton@uniontrib.com