Tony Rossignol wrote:
Tres Seaver wrote:
Tony Rossignol <tonyr@ep.newtimes.com>
Does anyone have any insite into what this error message might be?
2000-03-20T06:36:51 ERROR(200) ZServer uncaptured python exception, closing channel <FCGIChannel connected 206.138.64.10:1819 at 10c1b2f0> (socket.error:(32, 'Broken pipe') [/usr/local/Zope-2.1.4-NT-0.1.6/ZServer/medusa/asynchat.py|initiate_send|211] [/usr/local/Zope-2.1.4-NT -0.1.6/ZServer/medusa/asyncore.py|send|237]) ------ 2000-03-20T06:36:51 ERROR(200) ZServer uncaptured python exception, closing channel <FCGIChannel connected 206.138.64.10:1819 at 10c1b2f0> (socket.error:(32, 'Broken pipe') [/usr/local/Zope-2.1.4-NT-0.1.6/ZServer/medusa/asynchat.py|initiate_send|211] [/usr/local/Zope-2.1.4-NT -0.1.6/ZServer/medusa/asyncore.py|send|237]) ------
This problem is a symptom of something inside Zope which is killing off threads, most likely an unhandled exception somewhere. Try turning on the STUPID_DEBUGGER_LOG and see if you get more enlightening log output.
Thanks for the reply. One problem, I can't find anything on the STUPID_DEBUGGER_LOG, is it the same as STUPID_FILE_LOGGER? Or is this an undocumented option?
D'oh! The environment variable which Zope checks for is STUPID_LOG_FILE: * If it is set to a blank string, Zope writes the verbose logging to stderr; * If it is a non-blank string, Zope opens it as a file and writes the verbose logging there; * Otherwise, Zope suppresses the verbose logging.
I'm guessing that these 'Broken pipe' errors might be what is causing zope to periodically restart itself.
Ayup, there have been several threads reporting such restarts lately -- have you been following them? (some were on the zope-dev list, I think).
Again thanks.
Sorry for the confusion! Tres. -- ========================================================= Tres Seaver tseaver@palladion.com 713-523-6582 Palladion Software http://www.palladion.com
Just to add my two cents: I've seen these as well, usually connected to someone disconnecting from downloading a page mid-way, usually also connected to zope going into a really tight 'select (0,1), select (0,1,2), select (0,1,2,3), select(0,1,2,3,4), etc loop - until it runs out of fd's it's allowed to open, crashes, and needs restarting. Oh, and only when running with FastCGI. Redhat 6.1ish, glibc from rpm glibc-2.1.2-15, python 1.5.2 from rpm python-1.5.2-7, apache from rpm apache-1.3.9-8. My response so far has been to whack a while : ; do ... done loop around the kickoff command in ./start, and run the whole lot under screen -dmS. STUPID_LOG_FILE didn't produce anything that was immediately useful to me :( I've just upgraded to 2.1.6, and so far I haven't seen the problem, which may or may not be a good sign ;) KevinL
Tres Seaver wrote Tony Rossignol wrote:
Tres Seaver wrote:
Tony Rossignol <tonyr@ep.newtimes.com>
Does anyone have any insite into what this error message might be?
2000-03-20T06:36:51 ERROR(200) ZServer uncaptured python exception, closing channel <FCGIChannel connected 206.138.64.10:1819 at 10c1b2f0> (socket.error:(32, 'Broken pipe') [/usr/local/Zope-2.1.4-NT-0.1.6/ZServer/medusa/asynchat.py|initiate_sen
d|211]
[/usr/local/Zope-2.1.4-NT -0.1.6/ZServer/medusa/asyncore.py|send|237]) ------ 2000-03-20T06:36:51 ERROR(200) ZServer uncaptured python exception, closing channel <FCGIChannel connected 206.138.64.10:1819 at 10c1b2f0> (socket.error:(32, 'Broken pipe') [/usr/local/Zope-2.1.4-NT-0.1.6/ZServer/medusa/asynchat.py|initiate_sen d|211] [/usr/local/Zope-2.1.4-NT -0.1.6/ZServer/medusa/asyncore.py|send|237]) ------
[snip]
Kevin Littlejohn wrote:
Just to add my two cents: I've seen these as well, usually connected to someone disconnecting from downloading a page mid-way, usually also connected to zope going into a really tight 'select (0,1), select (0,1,2), select (0,1,2,3), select(0,1,2,3,4), etc loop - until it runs out of fd's it's allowed to open, crashes, and needs restarting. Oh, and only when running with FastCGI.
You've seen this before? Great you're the first person who actually gives me hope I'm not insane. We have been running 7 sites from a single zope install replicated across 3 servers. Two of the servers experience frequent Zope restarts (as often as every hour). The third server running an older version of Redhat (6.0 I believe) and glibc-2.0.7-29, and hardly ever restarts. We are running Redhat 6.1, glibc from rpm glibc-2.1.1.6, python from rpm python-1.5.2-7 and Apache/1.3.9 mod_fastcgi/2.2.3 and Apache/1.3.11 mod_fastcgi/2.2.3.
My response so far has been to whack a while : ; do ... done loop around the kickoff command in ./start, and run the whole lot under screen -dmS. STUPID_LOG_FILE didn't produce anything that was immediately useful to me :(
We have even gone as far as to modify FCGIServer.py to produce "debug" output to determine which requests go unanswered at the point of restart; hoping this would lead to some suspect DTML or ExternalMethod code. No luck.
I've just upgraded to 2.1.6, and so far I haven't seen the problem, which may or may not be a good sign ;)
We're waiting a bit to upgrade to see how it shakes out w/ SiteAccess etc. We've also recently bumped cache *WAY* up, 450,000 items, w/ 5 minute target max between accesses. Which has resulted in 200M of memory for the process. This appears to have lowered the number of restarts, but not eliminated them. In your experience with this problem have you noticed /manage and or traffic volume to have any impact on the frequency of restarts? We've been looking everywhere to determine what may be going wrong, and have experienced the most restarts when traffic and /manage volumes are up. We unfortunatly have our most /manage volume within the same timespan as our heaviest traffic. Hopefully we can find a solution to this. Thanks for the feedback. Let me know if you make any headway. ------------------------------- tonyr@ep.newtimes.com Director of Web Technology New Times, Inc. -------------------------------
Tony Rossignol wrote Kevin Littlejohn wrote:
Just to add my two cents: I've seen these as well, usually connected to someone disconnecting from downloading a page mid-way, usually also connected to zope going into a really tight 'select (0,1), select (0,1,2), select (0,1,2,3), select(0,1,2,3,4), etc loop - until it runs out of fd's it's allowed to open, crashes, and needs restarting. Oh, and only when running with FastCGI.
You've seen this before? Great you're the first person who actually gives me hope I'm not insane. We have been running 7 sites from a single zope install replicated across 3 servers. Two of the servers experience frequent Zope restarts (as often as every hour). The third server running an older version of Redhat (6.0 I believe) and glibc-2.0.7-29, and hardly ever restarts.
We are running Redhat 6.1, glibc from rpm glibc-2.1.1.6, python from rpm python-1.5.2-7 and Apache/1.3.9 mod_fastcgi/2.2.3 and Apache/1.3.11 mod_fastcgi/2.2.3.
I'm pretty convinced it's related to glibc version - the fact that a strace of the process shows it repeatedly opening/selecting over more and more fd's just before the process goes away suggests some sort of funky error handling to me - maybe one of the socket handling routines had a change in what error it produces between glibc versions? Anyway, I haven't had time to sit down and properly dig.
I've just upgraded to 2.1.6, and so far I haven't seen the problem, which may or may not be a good sign ;)
We're waiting a bit to upgrade to see how it shakes out w/ SiteAccess etc.
2.1.6 suffers the same problem.
In your experience with this problem have you noticed /manage and or traffic volume to have any impact on the frequency of restarts? We've
I can reproduce it instantly with a large enough page - stop the request mid-way, the closing of the socket to the client while Zope is serving the page up seems to be what causes it to go into it's loop, then die. Mondo apologies to the zope crew for not putting this in the collector - I had hoped to find the actual problem (and maybe a solution) first, but ran out of time :( KevinL --------------- qnevhf@obsu.arg.nh --------------- Kevin Littlejohn, Technical Architect, Connect.com.au Don't let the Govt censor our access to the 'net - http://www.efa.org.au/Campaigns/stop.html
Kevin Littlejohn wrote:
I'm pretty convinced it's related to glibc version - the fact that a strace of the process shows it repeatedly opening/selecting over more and more fd's just before the process goes away suggests some sort of funky error handling to me - maybe one of the socket handling routines had a change in what error it produces between glibc versions? Anyway, I haven't had time to sit down and properly dig.
This is great, some light at the end of a very dark tunnel. One question here: How where you able to get strace to work? I've been trying and it either would lock Zope up or just return instantly.
2.1.6 suffers the same problem.
Yes, we noticed that to.
I can reproduce it instantly with a large enough page - stop the request mid-way, the closing of the socket to the client while Zope is serving the page up seems to be what causes it to go into it's loop, then die.
Is this only when accessing via FastCGI or does happen when using ZServer as well? Thanks for the info. If we find anything out I'll let you know. -- ------------------------------- tonyr@ep.newtimes.com Director of Web Technology New Times, Inc. -------------------------------
I can reproduce it instantly with a large enough page - stop the request mid-way, the closing of the socket to the client while Zope is serving the page up seems to be what causes it to go into it's loop, then die.
Is this only when accessing via FastCGI or does happen when using ZServer as well?
Just for the record, I dont' use FastCGI and have seen Zope die under the same circumstances; ie. by disconnecting the browser mid-download of a (usually long) page. Fortunately, that 'client' has been me via the administration screens, but it does seem like a rather trivial DoS. But it seems that this only happens with certain types of pages - I'm not sure which. Unfortunately, I can't remember if this was when we were using PCGI or Proxy-Pass, but it was definitely one of the two. Hearing that others are now experiencing the same gives me hope: this could explain the crashes that we've been plagued with for months (though I fear any solution will come too late to prevent our switch to an alternative platform). chas ps. Weird thing : the most stable Zope installation I have is on NT. It never, ever crashes, despite being the development machine (I wouldn't say that load is causing my production server to die b/c it happens under low load too).
On Thu, 23 Mar 2000, chas wrote:
Hearing that others are now experiencing the same gives me hope: this could explain the crashes that we've been plagued with for months (though I fear any solution will come too late to prevent our switch to an alternative platform).
It seems that a lot of sites (including mine) have problems with Zope's stability but somehow not many people reported the problems, and I am not sure whether DC was/is aware of them. Our most useful Zope component right now is a script that checks our Zope server every 5 minutes and restarts it if it is down. I have spend many hours debuging but I failed to find anything sustantial. I am under the impression (but nothing more solid) that one problem might involve the interaction of the select call and signal handling or the python wrap around the select call. No proofs though. Unfortunately like you I failed to convince people in my group to use Zope for more ambitious projects because of the stability problem. Pavlos
Pavlos Christoforou wrote:
On Thu, 23 Mar 2000, chas wrote:
Hearing that others are now experiencing the same gives me hope: this could explain the crashes that we've been plagued with for months (though I fear any solution will come too late to prevent our switch to an alternative platform).
It seems that a lot of sites (including mine) have problems with Zope's stability but somehow not many people reported the problems, and I am not sure whether DC was/is aware of them. Our most useful Zope component right
It may also be due to people using the ZopeMonitor on Unix, which restarts Zope if it dies. -- In flying I have learned that carelessness and overconfidence are usually far more dangerous than deliberately accepted risks. -- Wilbur Wright in a letter to his father, September 1900
Bill Anderson wrote:
It may also be due to people using the ZopeMonitor on Unix, which restarts Zope if it dies.
What is this? Is this something I can download? ethan fremen -- http://mindlace.net __________________ mindlace@imeme.net I don't want The Truth but I wouldn't mind a Big Analogy.
mindlace wrote:
Bill Anderson wrote:
It may also be due to people using the ZopeMonitor on Unix, which restarts Zope if it dies.
What is this? Is this something I can download?
ethan fremen
Even better, it is a part of Zope! :)
From z2.py: ========================================
-Z path Unix only! This option is ignored on windows. If this option is specified, a separate managemnt process will be created that restarts Zope after a shutdown (or crash). The path must point to a pid file that the process will record its process id in. The path may be relative, in which case it will be relative to the Zope location. To prevent use of a separate management process, provide an empty string: -Z '' =========================================== Basically, add the option, along with the requisite options, and there you go. -- In flying I have learned that carelessness and overconfidence are usually far more dangerous than deliberately accepted risks. -- Wilbur Wright in a letter to his father, September 1900
participants (7)
-
Bill Anderson -
chas -
Kevin Littlejohn -
mindlace -
Pavlos Christoforou -
Tony Rossignol -
Tres Seaver