Bjorn Stabell wrote:
Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big?
Make the app faster so that doesn't happen.
You'll always reach the bottleneck sooner or later, and in 99% of the cases it'll be Zope. It's not exactly a racing horse. [...]
...return 503 to every new request probably isn't a good idea, 503 to every new session is probably OK. Maybe. It depends on the workload. If you have a user who sends 10 requests and 5 of them fail, they're going to keep hitting reload to try to get them all to work, which just compounds your problems. The idea is that you want tweak it so some of your users get everything, and some get nothing, instead of everybody getting partials. This means you have to track sessions though... thats probably easier to do in ZServer than it is in Apache 1.3 just because of the process model, but that doesn't mean its easy.
By default, Apache doesn't track sessions and do that kind of planned failure. There are probably modules that can enable that behavior.
ZServer's multiplexed IO model just hides the accepting socket from the poller if its concurrency level is reached, and yeah, then the incoming connections end up in the backlog. You'd have to change ZServer so it handled those new requests instead, identified if they were part of a session, queued them up if they were, or spat out a 503 if they weren't.
With KeepAlive on, wouldn't everything in one "session" sent over one TCP connection? If that is the case, each accept() will basically service a whole session, and so you should be able to do the easy thing of refusing a connection just after an accept().
Frankly, I bet you'd have more fun just making your app faster.
We'll do that as well, but this problem is pretty serious; we can't overload our server for even just a little while before latencies start building up to ridiculous levels and users start hitting reload, further compounding the problem. In the end we have to restart the server, same as Dario reported. Bye, -- Bjorn