RE: [Zope-dev] How to make Zope fail nicely under high load?

11 Feb 2004

      ...
Bjorn Stabell wrote:
...
Basically, when the load gets high, Zope has a huge 
backload of work 
(several minutes of requests), making the average latency for each 
request many minutes.  What are effective ways to do this kind of 
overload management so that the backlog of work doesn't get that
big?
Make the app faster so that doesn't happen.
You'll always reach the bottleneck sooner or later, and in 99% of the
cases it'll be Zope.  It's not exactly a racing horse.

[...]
...
...return 503 to every new request probably isn't a good 
idea, 503 to every new session is probably OK.  Maybe.  It 
depends on the workload. If you have a user who sends 10 
requests and 5 of them fail, they're going to keep hitting 
reload to try to get them all to work, which just compounds 
your problems.  The idea is that you want tweak it so some of 
your users get everything, and some get nothing, instead of 
everybody getting partials.  This means you have to track 
sessions though... thats probably easier to do in ZServer 
than it is in Apache 1.3 just because of the process model, 
but that doesn't mean its easy.
By default, Apache doesn't track sessions and do that kind of 
planned failure.  There are probably modules that can enable 
that behavior.
ZServer's multiplexed IO model just hides the accepting 
socket from the poller if its concurrency level is reached, 
and yeah, then the incoming connections end up in the 
backlog.  You'd have to change ZServer so it handled those 
new requests instead, identified if they were part of a 
session, queued them up if they were, or spat out a 503 if 
they weren't.
With KeepAlive on, wouldn't everything in one "session" sent over one
TCP connection?  If that is the case, each accept() will basically
service a whole session, and so you should be able to do the easy thing
of refusing a connection just after an accept().
...
Frankly, I bet you'd have more fun just making your app faster.
We'll do that as well, but this problem is pretty serious; we can't
overload our server for even just a little while before latencies start
building up to ridiculous levels and users start hitting reload, further
compounding the problem.  In the end we have to restart the server, same
as Dario reported.

Bye,
-- 
Bjorn