How to make Zope fail nicely under high load?
Hi, We've run into an interesting problem when load-testing a Zope site behind Apache, a problem that we've also encountered in real life. Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big? The ideal would be for requests to fail immediately if the backlog of work is more than a certain number of requests (or even better, estimated time to process). Here's what we've tried: Naively, we thought we could just set the socket.listen() backlog in Apache and Zope to a lower value, but in TCP connect()'s apparently don't fail if the server's listen backlog is full; instead requests are retried, resulting in a client side managed "listen backlog", also giving the same long latency. (If someone knows this stuff, please confirm/deny these allegations against TCP :) It appears the way to control it would for Apache or Zope to return "503 Service Unavailable" when the load is too high, but we haven't found a good way to do this; Zope doesn't appear to have any mechanism for it, and Apache's ProxyPass doesn't either. I guess load balancers would, but that's a bit overkill since we run the server on one machine. Regards, -- Bjorn Stabell bjorn@exoweb.net Tel +86 (10) 65918490
Bjorn Stabell wrote:
It appears the way to control it would for Apache or Zope to return "503 Service Unavailable" when the load is too high, but we haven't found a good way to do this; Zope doesn't appear to have any mechanism for it, and Apache's ProxyPass doesn't either. I guess load balancers would, but that's a bit overkill since we run the server on one machine.
Hi, I'm very interested in your resutsl. We have exaclty the same situation, though we have not analysed it further - the only thing that saves us is to restart zope. re: load balancing: there is Pound - http://www.apsis.ch/pound/ - that might serve as a loadbalancer between processes in a single machine. Maybe that will help a bit, though it doesn't really resolve the issue with zope choking; if it works, it will merely circumvent it. hth, /dario -- -- ------------------------------------------------------------------- Dario Lopez-Kästen, IT Systems & Services Chalmers University of Tech.
Bjorn Stabell wrote:
Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big?
Make the app faster so that doesn't happen.
The ideal would be for requests to fail immediately if the backlog of work is more than a certain number of requests (or even better, estimated time to process).
Kinda...
It appears the way to control it would for Apache or Zope to return "503 Service Unavailable" when the load is too high, but we haven't found a good way to do this; Zope doesn't appear to have any mechanism for it, and Apache's ProxyPass doesn't either. I guess load balancers would, but that's a bit overkill since we run the server on one machine.
...return 503 to every new request probably isn't a good idea, 503 to every new session is probably OK. Maybe. It depends on the workload. If you have a user who sends 10 requests and 5 of them fail, they're going to keep hitting reload to try to get them all to work, which just compounds your problems. The idea is that you want tweak it so some of your users get everything, and some get nothing, instead of everybody getting partials. This means you have to track sessions though... thats probably easier to do in ZServer than it is in Apache 1.3 just because of the process model, but that doesn't mean its easy. By default, Apache doesn't track sessions and do that kind of planned failure. There are probably modules that can enable that behavior. ZServer's multiplexed IO model just hides the accepting socket from the poller if its concurrency level is reached, and yeah, then the incoming connections end up in the backlog. You'd have to change ZServer so it handled those new requests instead, identified if they were part of a session, queued them up if they were, or spat out a 503 if they weren't. Frankly, I bet you'd have more fun just making your app faster. -- Jamie Heilman http://audible.transient.net/~jamie/ "Paranoia is a disease unto itself, and may I add, the person standing next to you may not be who they appear to be, so take precaution." -Sathington Willoughby
On Wednesday 11 February 2004 09:16, Jamie Heilman wrote:
Bjorn Stabell wrote:
Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big?
Zope's ZServer manages a queue of requests that have been recieved over http, but not dispatched to the publisher. This is handled in PubCore/ZRendezvous.py. I suspect this queue will be holding your backlog. You might get some benefit from capping the length of that queue.
The idea is that you want tweak it so some of your users get everything, and some get nothing, instead of everybody getting partials. This means you have to track sessions though...
Before creating a session, check the size of the ZRendezvous backlog. might work. -- Toby Dickenson
Toby Dickenson wrote:
Zope's ZServer manages a queue of requests that have been recieved over http, but not dispatched to the publisher. This is handled in PubCore/ZRendezvous.py. I suspect this queue will be holding your backlog.
You might get some benefit from capping the length of that queue.
Thats right! Yeah, true, it probably is holding the backlog as the default concurrency (on unix) is pretty high, but capping that queue ... I dunno, what do you with the new requests once its full? You're back to the problem of identifying sessions, and thats a potential mess.
Before creating a session, check the size of the ZRendezvous backlog. might work.
Yeah, thats a good plan in terms of where to instrument it, if you had to. If it were me, I'd sooner throw more hardware at it than use session identifiers though. -- Jamie Heilman http://audible.transient.net/~jamie/
This very much depends on your application and requirements (and your definition of "nicely" :-)), but I'd argue that it rarely make sense to handle this at the TCP Connection level (just to think about browsers opening multiple connections, HTTP/1.0 or /1.1 compliant browsers, proxies etc.) As an example, one client of ours had the following requirements wrt. to this problem which I think should be fairly common: * Allow X logged-in users till a certain responsiveness threshold is reached * If said threshold is reached: - inform all users trying to log that the site is too loaded - while allowing already-logged-in users to still use the site with acceptable speed This means we had to a) measure responsiveness (which we did with some cobbled together heuristic involving the rate of user logins (something like 5 users per minute) and an external script effectively "HTTP Pinging" the site every X minutes) and b) redirecting users to a statically hosted page (which can be served cheaply) if said conditions were met. This kind of thing clearly cannot be done at the TCP level because TCP Connection != User session cheers, peter. Bjorn Stabell wrote:
Hi,
We've run into an interesting problem when load-testing a Zope site behind Apache, a problem that we've also encountered in real life.
Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big?
The ideal would be for requests to fail immediately if the backlog of work is more than a certain number of requests (or even better, estimated time to process).
Here's what we've tried:
Naively, we thought we could just set the socket.listen() backlog in Apache and Zope to a lower value, but in TCP connect()'s apparently don't fail if the server's listen backlog is full; instead requests are retried, resulting in a client side managed "listen backlog", also giving the same long latency. (If someone knows this stuff, please confirm/deny these allegations against TCP :)
It appears the way to control it would for Apache or Zope to return "503 Service Unavailable" when the load is too high, but we haven't found a good way to do this; Zope doesn't appear to have any mechanism for it, and Apache's ProxyPass doesn't either. I guess load balancers would, but that's a bit overkill since we run the server on one machine.
Regards,
Bjorn Stabell wrote:
Hi,
We've run into an interesting problem when load-testing a Zope site behind Apache, a problem that we've also encountered in real life.
Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big?
The ideal would be for requests to fail immediately if the backlog of work is more than a certain number of requests (or even better, estimated time to process).
Here's what we've tried:
Naively, we thought we could just set the socket.listen() backlog in Apache and Zope to a lower value, but in TCP connect()'s apparently don't fail if the server's listen backlog is full; instead requests are retried, resulting in a client side managed "listen backlog", also giving the same long latency. (If someone knows this stuff, please confirm/deny these allegations against TCP :)
It appears the way to control it would for Apache or Zope to return "503 Service Unavailable" when the load is too high, but we haven't found a good way to do this; Zope doesn't appear to have any mechanism for it, and Apache's ProxyPass doesn't either. I guess load balancers would, but that's a bit overkill since we run the server on one machine.
Regards,
Naturally, It would be really nice to have a solution inside zope where you can configure stuff like the maxinum number of concurrent sessions, and a hook for handling the refusal of a new session. This so called graceful degradation functionality could also be used for -) graceful shutdown: now, a shutdown of a zope server is brutal. For some applications you want people to be able to finish their business, while at the same time you want that no new users can get into the system. -) licensing policies: having web services that are used by at most X people at the same time. These things have to be handled at the Zope level, since neither the apache, nor the networking layer have any clue about things like user session, aso. Ok, This said. who writes the proposal? ;) Romain Slootmaekers.
participants (6)
-
Bjorn Stabell -
Dario Lopez-Kästen -
Jamie Heilman -
Peter Sabaini -
Romain Slootmaekers -
Toby Dickenson