Re: Zope hanging (poss. threads-related)
Some feedback: I've installed Zope 2.1.6 from source on the same machine, with as close to an identical setup as is possible on a live server. Even after really hammering 2.1.6, though, I've been unable to produce the same hanging behaviour observed in 2.1.3. The only differences, apart from the apache rewrite rules, are that the 2.1.6 install is using GenericUserFolder 1.2.2 (vs. 1.1.0) and SiteAccess 1.0.1 (vs. 1.0.0). I doubt that these products are the cause of the problem. It's also pretty unlikely that this is an OS threads problem, at least on our platform. We're running Zope on FreeBSD with pthreads; this is a different thread implementation to that used on Linux. If this is happening on Linux, FreeBSD and Solaris, the my hunch (dare I voice it?) is that this hanging occurs somewhere deep in the bowels of ZServer. Zope has in the past been fairly stable using four threads; it was only when the threads were increased to 20 that it began hanging repeatedly. We really *do* need to run Zope with a modest number of threads, as some database queries can be expected to take a couple of seconds to complete. I really do want to get the the root of this problem; if anyone out there has some suggestions or further information requirements, I'm listening! Many thanks, -- Marcus -----Original Message----- From: Marcus Collins [mailto:mcollins@sunesi.com] Sent: 10 April 2000 14:48 To: 'zope@zope.org' Subject: [Zope] Zope hanging (poss. threads-related) Apologies for the imposing subject line... Briefly, since increasing the NUMBER_OF_THREADS that Zope uses from the default (4) to 20, Zope has been hanging randomly. A request will be sent off to Zope (whether through apache with mod_pcgi, or directly through ZServer), but the request is never served up, and Z2.log shows no record of it. No additional CPU is taken by the python process. Environment: Zope 2.1.3 (source release) python 1.5.2 GCC 2.7.2.3 FreeBSD 3.3-RELEASE The problem was reproducible on pages with framesets, where some of the frame documents are loaded, and one is omitted. It occurs when viewing the management interface, for example (manage_main is displayed, but manage_workspace is not). It seems that only one thread hangs, since other requests continue to be served. Not only frameset pages were affected, though only on frameset pages did the hanging consistently occur. This behaviour was exhibited under minimal load (one or two users), and ceased when I reverted to 4 threads and restarted Zope. Our development machine, running 2.1.4 source, had no such problems when running Zope with 20 threads. I don't recall any drastic changes or checkins between 2.1.3 and 2.1.4, though... (that was basically the /REQUEST traversal security fix) I'll be installing Zope 2.1.6, and try to approximate the the same conditions, but since this is a live machine, it's going to be tricky to find a window of time to perform the switch-over... Anyone out there with some ideas as to the root cause of these hangings, or with similar tales to tell? TIA, -- Marcus
On Tue, 11 Apr 2000, Marcus Collins wrote:
different thread implementation to that used on Linux. If this is happening on Linux, FreeBSD and Solaris, the my hunch (dare I voice it?) is that this hanging occurs somewhere deep in the bowels of ZServer.
Although I have stopped chasing this bug, my guess too, is that the problem is with ZServer somewhere. For one, placing a print statement inside the async loop solved one of the problems I had, but I never figured out what was wrong. In the end I gave up and wrote my own python supervisor module (based on Dan Bernstein's author of qmail) which basically restarts Zope when it dies, plus a script to check if it has hanged in which case it restarts it again. Pavlos
Marcus Collins wrote:
I've installed Zope 2.1.6 from source on the same machine, with as close to an identical setup as is possible on a live server. Even after really hammering 2.1.6, though, I've been unable to produce the same hanging behaviour observed in 2.1.3. The only differences, apart from the apache rewrite rules, are that the 2.1.6 install is using GenericUserFolder 1.2.2 (vs. 1.1.0) and SiteAccess 1.0.1 (vs. 1.0.0). I doubt that these products are the cause of the problem.
I have been observing Zope restarts under moderate site load w/ management activity occurring. I have three Linux servers running zope; two w/ RH6.2 and one w/ RH5.2(?). The one on the older version of the OS does not spontaneously restart, both other servers will restart w/ the one designated to receive the /manage activity restarting at least once an hour during moderate use.
It's also pretty unlikely that this is an OS threads problem, at least on our platform. We're running Zope on FreeBSD with pthreads; this is a different thread implementation to that used on Linux. If this is happening on Linux, FreeBSD and Solaris, the my hunch (dare I voice it?) is that this hanging occurs somewhere deep in the bowels of ZServer.
Since this problem has been reported on serveral OSs now I tend to go along w/ you in looking for a ZServer/medusa cause for the problem. It still puzzles me as to why the older version of Linux/RH is for some reason more stable. I'm in the process of getting another machine for testing that will have RH5.2 and be designated as the /manage server, but I'm waiting on the HW. I was orginally figuring something changed in the RH6.2 distribution, but with these restarts occuring on FreeBSD and Solaris (I believe I've seen complaints from both OSs on this list at one time or another), I'm wondering if speed may have something to do w/ it. Our older system is also a slower box dual 400Mhz vs dual 500Mhz.
Zope has in the past been fairly stable using four threads; it was only when the threads were increased to 20 that it began hanging repeatedly. We really *do* need to run Zope with a modest number of threads, as some database queries can be expected to take a couple of seconds to complete.
I also wonder if the DB connections might be something comming into play here. We use MySQL heavily for serveral sections of our Zope site. Has anyone been seeing Zope restarts that do not use any DB adapters? If DB adapters appear to be the culprit it might be in the Aquacut code not the ZServer/medusa code? Hmmm, I haven't given that much thought.
I really do want to get the the root of this problem; if anyone out there has some suggestions or further information requirements, I'm listening!
Amen! I've been suffering thought this for several months now. We built several failsafe systems and use laod balancing and static caching heavily to mask these restart problems from our end-users. But as we try to start adding more of the interactive features our site demands, our ability to cache and hide these problems become more and more difficult. I hate to say it but at some point Zope may not be the solution for our needs due solely to this stability issue :( Any insights, help, or feedback would be greatly appreciated. -- ------------------------------- tonyr@ep.newtimes.com Director of Web Technology New Times, Inc. -------------------------------
Tony Rossignol wrote:
Marcus Collins wrote:
something
Since this problem has been reported on serveral OSs now I tend to go along w/ you in looking for a ZServer/medusa cause for the problem. It still puzzles me as to why the older version of Linux/RH is for some reason more stable. I'm in the process of getting another machine for testing that will have RH5.2 and be designated as the /manage server, but I'm waiting on the HW. I was orginally figuring something changed in the RH6.2 distribution, but with these restarts occuring on FreeBSD and Solaris (I believe I've seen complaints from both OSs on this list at one time or another), I'm wondering if speed may have something to do w/ it. Our older system is also a slower box dual 400Mhz vs dual 500Mhz.
Well, I can't say for every one else, but my Solaris installations are surprisingly to problem. I say surprising because the installation on Solaris is a hell of a lot more bubble-gum and string to put it where we want it. (I don't care so much on Linux. ) Also, I was experiencing the problem quite a bit on 2.1.2, then upped to 2.1.4 and it quit. I tried and upgrade to 2.1.6, had problems and went back to 2.1.4 and then it started up again. I know from a previous life as a a support person that that is almost useless information, being unreproducable and all, but I'm wondering if the sporadic nature of the problem could be linked to the spoogey install? The DC people would probably have the best understanding of that process, and I would be willing to bet that they don't experience this problem all that much. Can't say what it would be, but it could also bear looking into.
Zope has in the past been fairly stable using four threads; it was only when the threads were increased to 20 that it began hanging repeatedly. We really *do* need to run Zope with a modest number of threads, as some database queries can be expected to take a couple of seconds to complete.
I also wonder if the DB connections might be something comming into play here. We use MySQL heavily for serveral sections of our Zope site. Has anyone been seeing Zope restarts that do not use any DB adapters? If DB adapters appear to be the culprit it might be in the Aquacut code not the ZServer/medusa code? Hmmm, I haven't given that much thought.
Yes. I've had it crash/hang/die when doing things completely unrelated to DB adapters. Although almost never when the management interface is not involved.
I really do want to get the the root of this problem; if anyone out there has some suggestions or further information requirements, I'm listening!
Amen! I've been suffering thought this for several months now. We built several failsafe systems and use laod balancing and static caching heavily to mask these restart problems from our end-users. But as we try to start adding more of the interactive features our site demands, our ability to cache and hide these problems become more and more difficult. I hate to say it but at some point Zope may not be the solution for our needs due solely to this stability issue :(
Double amen here. Zope's mainly still in development here, so no one else notices, but if my manager (who keeps wanting Lotus Notes --shiver-- don't ask me why) catches wind that my solution hangs/crashes with any regularity... the end result just won't be good. :) Monty
Marcus Collins wrote:
Some feedback:
Zope has in the past been fairly stable using four threads; it was only when the threads were increased to 20 that it began hanging repeatedly. We really *do* need to run Zope with a modest number of threads, as some database queries can be expected to take a couple of seconds to complete.
I suspect this problem *might* be unrelated to the threadlock discussed so far, in the case of the reported lock, 2 or more threads cause instability. In your case, you report 4 is stable. I think you are running up against the hardwirded database connection limit (7) in ZODB. You have more threads then there are connections. I suggest not raising the number of threads above 7 or changing the hardwired limit... problem is I don't remember where the hardwiring is... This has been discussed in the past, look for messages by Jim Fulton. -Michel
I *think* it's the pool_size parameter in lib/python/ZODB/DB.py in the DB class' __init__ method: class DB: """The Object Database The Object database coordinates access to and interaction of one or more connections, which manage object spaces. Most of the actual work of managing objects is done by the connections. """ def __init__(self, storage, pool_size=7, # here... cache_size=400, cache_deactivate_after=60, version_pool_size=3, version_cache_size=100, version_cache_deactivate_after=10, ): Michel Pelletier wrote:
Marcus Collins wrote:
Some feedback:
Zope has in the past been fairly stable using four threads; it was only when the threads were increased to 20 that it began hanging repeatedly. We really *do* need to run Zope with a modest number of threads, as some database queries can be expected to take a couple of seconds to complete.
I suspect this problem *might* be unrelated to the threadlock discussed so far, in the case of the reported lock, 2 or more threads cause instability. In your case, you report 4 is stable.
I think you are running up against the hardwirded database connection limit (7) in ZODB. You have more threads then there are connections. I suggest not raising the number of threads above 7 or changing the hardwired limit... problem is I don't remember where the hardwiring is... This has been discussed in the past, look for messages by Jim Fulton.
-Michel
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
-- Chris McDonough Digital Creations Publishers of Zope - http://www.zope.org
participants (6)
-
Chris McDonough -
Marcus Collins -
Michel Pelletier -
Monty Taylor -
Pavlos Christoforou -
Tony Rossignol