RE: [Zope-dev] How to make Zope fail nicely under high load?
Bjorn Stabell wrote:
Basically, when the load gets high, Zope has a huge backload of work (several minutes of requests), making the average latency for each request many minutes. What are effective ways to do this kind of overload management so that the backlog of work doesn't get that big?
Make the app faster so that doesn't happen.
You'll always reach the bottleneck sooner or later, and in 99% of the cases it'll be Zope. It's not exactly a racing horse. [...]
...return 503 to every new request probably isn't a good idea, 503 to every new session is probably OK. Maybe. It depends on the workload. If you have a user who sends 10 requests and 5 of them fail, they're going to keep hitting reload to try to get them all to work, which just compounds your problems. The idea is that you want tweak it so some of your users get everything, and some get nothing, instead of everybody getting partials. This means you have to track sessions though... thats probably easier to do in ZServer than it is in Apache 1.3 just because of the process model, but that doesn't mean its easy.
By default, Apache doesn't track sessions and do that kind of planned failure. There are probably modules that can enable that behavior.
ZServer's multiplexed IO model just hides the accepting socket from the poller if its concurrency level is reached, and yeah, then the incoming connections end up in the backlog. You'd have to change ZServer so it handled those new requests instead, identified if they were part of a session, queued them up if they were, or spat out a 503 if they weren't.
With KeepAlive on, wouldn't everything in one "session" sent over one TCP connection? If that is the case, each accept() will basically service a whole session, and so you should be able to do the easy thing of refusing a connection just after an accept().
Frankly, I bet you'd have more fun just making your app faster.
We'll do that as well, but this problem is pretty serious; we can't overload our server for even just a little while before latencies start building up to ridiculous levels and users start hitting reload, further compounding the problem. In the end we have to restart the server, same as Dario reported. Bye, -- Bjorn
Bjorn Stabell wrote:
You'll always reach the bottleneck sooner or later, and in 99% of the cases it'll be Zope. It's not exactly a racing horse.
Tell me about it, in a former life I was the guy with the pager.
With KeepAlive on, wouldn't everything in one "session" sent over one TCP connection? If that is the case, each accept() will basically service a whole session, and so you should be able to do the easy thing of refusing a connection just after an accept().
No, a session isn't that nice and tidy, clients can and do open multiple simultaneous connections, connection re-use and pipelining aren't all that define a session. If it were that easy, ZServer would do this already, so would, I imagine, Apache. Its not a pretty problem, which is why I suggest you spend your time looking at what you can do to scale better, or look for an Apache module that implements some sane-sounding session heuristic and give it a whirl, it will increase the resources needed by Apache, but maybe the costs will balance out.
We'll do that as well, but this problem is pretty serious; we can't overload our server for even just a little while before latencies start building up to ridiculous levels and users start hitting reload, further compounding the problem. In the end we have to restart the server, same as Dario reported.
You'll have to move to load spreading, zeo, etc if you can't tweak your app any further. I personally feel that the ZEO seperation should be compulsary anyway, it just makes more sense, but I digress. Find yourself an Apache module that can spit out 503s, then work on load balancing infrastructure, which is probably the most viable longterm solution (next to simply not using Zope for something it evidently isn't very good at). -- Jamie Heilman http://audible.transient.net/~jamie/
*Scratches head*. Isn't it possible to just timestamp every request in a list of running requests, and in the beginning of each request check how long the currently processing requests have been running, and if that sum is above a specified time, fail? OK, you get the problem that images may not load even if the main page does, but is that really worse for the end user than not getting anything? Hysteresis could also be used, so that all requests fail after the time limit have been reached once, until the total time fall under another smaller time limit. http://www.homestarrunner.com/systemisdown.html
Lennart Regebro wrote:
OK, you get the problem that images may not load even if the main page does, but is that really worse for the end user than not getting anything?
As I've been saying, if you do that, they will reload repeatedly making the problem worse. If the images are fluf, and the user knows they are fluf, then *maybe* they won't reload, and your pages will simply appear ugly. But then you have to ask yourself, why am I sending fluf images that degrades the overall user experience of my application? Clearly there's more optimization that could be done to your application. -- Jamie Heilman http://audible.transient.net/~jamie/ "You came all this way, without saying squat, and now you're trying to tell me a '56 Chevy can beat a '47 Buick in a dead quarter mile? I liked you better when you weren't saying squat kid." -Buddy
On Wed, 2004-02-11 at 13:09, Jamie Heilman wrote:
Lennart Regebro wrote:
OK, you get the problem that images may not load even if the main page does, but is that really worse for the end user than not getting anything?
As I've been saying, if you do that, they will reload repeatedly making the problem worse. If the images are fluf, and the user knows they are fluf, then *maybe* they won't reload, and your pages will simply appear ugly. But then you have to ask yourself, why am I sending fluf images that degrades the overall user experience of my application? Clearly there's more optimization that could be done to your application.
Without knowing the app I'd probably say there are many ways that it can be made more efficient. For instance are your image urls being generated using Zope's acquisition? If so, don't do that. Another alternative is to move the static/fluf images out of zope and just let apache serve them. That's just small thing that you can do to reduce the load on zope. Things like squid, zeo, better hardware, re-analyze long running routines, etc are all possible without having to do the other gymnastics that have been discussed. -- Edward Muller - http://www.interlix.com - "Open Source Specialists" Dedicated Zope Hosting - Web Hosting - Open Source Consulting Network & PC Service & Support - Custom Programming Phone: 417-862-0573 - Cell: 417-844-2435 - Fax: 417-862-0572 Jabber: edwardam@jabber.interlix.com - AIM: edwardam453 - ICQ: 287033
participants (4)
-
Bjorn Stabell -
Edward Muller -
Jamie Heilman -
Lennart Regebro