RE: [Zope] System performance threads/proccesses & random crashes (SIGPIPE)
Chris, Thanks for the great help. After doing some invertigation, I am now pretty sure the behavior mentionned in the FastCGI docs is what is happening here, a request isn't given time to finish. In my case this can be replicated by users clicking VERY fast on the web pages, sending a request to the server before the first one is finished. (I have some slow processes, drawing dynamically generated maps). I can replicate the SIGPIPE almost 100% using this method. Anyways, I took a look at the code you mentionned, and that will hopefully help, although I'm venturing in extremely unknown territory for me here! A couple of more things: This, so far as I can tell is a bug in the FastCGI implementation (Not handling SIGPIPE as suggested). Should I report it somewhere? Also I was reading the signal handling stuff for python and came upon this: Python installs a small number of signal handlers by default: SIGPIPE is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions) Now I'm confused. If Python ignores SIGPIPE by default, why is Zope complaining ? This would mean there there is allready a SIGPIPE handler defined somwehere overriding the default, in which case that is probably where the change should be made, instead of adding yet another handler. And finally, how do I "ignore" a signal ? I guess just writing a "pass" will work ? I'll try it out, I guess on reception of a signal, only one handler is a called once? Thanks again and again :) J.F. -----Original Message----- From: Chris McDonough [mailto:chrism@zope.com] Sent: Friday, March 22, 2002 3:05 PM To: Doyon, Jean-Francois; zope@zope.org; matt@zope.com Subject: Re: [Zope] System performance threads/proccesses & random crashes (SIGPIPE) You could register a SIGPIPE handler for Zope that just ignores the signal. See the chrism-logrotate-branch in CVS at http://cvs.zope.org/?only_with_tag=chrism_logrotate_branch and take a look at z2.py's "installsighandlers" function... maybe use this branch but add a SIGPIPE handler to the function that mimics the others except uses the function SIG_IGN as a callback instead of the current signal handler function. ----- Original Message ----- From: "Doyon, Jean-Francois" <Jean-Francois.Doyon@CCRS.NRCan.gc.ca> To: "'Chris McDonough'" <chrism@zope.com>; <zope@zope.org>; <matt@zope.com> Sent: Friday, March 22, 2002 2:45 PM Subject: RE: [Zope] System performance threads/proccesses & random crashes (SIGPIPE) Hello, Thanks for the help! Well, I've determined it most likely isn't PostgreSQL, since I switched the connections from socket based to TCP based, and the problem still occurs. So, I turn my attention to FastCGI ... I just read this on the FastCGI Website: If an http client aborts a request before it completes, mod_fastcgi does too - this results in a SIGPIPE to the FastCGI application. At a minimum, SIGPIPE should be ignored (applications spawned by mod_fastcgi have this setup automatically). Ideally, it should result in an early abort of the request handling within your application and a return to the top of the FastCGI accept() loop. I guess Zope isn't handling the SIGPIPE the way it is suggested here? Anyways this seems to be the most likely cause of the problems I'm having. That AND possibly the problem Matt describes. Matt, where can I find more information on this, and possible solutions? For now, I'm guessing switching to using TCP instead of sockets for FastCGI connections might help solve the problem? I am getting *A LOT* of these errors, every 5 to 10 minutes!!! And it *IS* traffic related ... when the business day dies down, the errors stop occuring (Normal usage pattern at this time would suupport the theory that the rrors are therefore directly related to the amount of usage). I'm also thinking of playing the -restart-delay option of the FastCgiServer directive ... Help!!! Thank you, J.F. -----Original Message----- From: Chris McDonough [mailto:chrism@zope.com] Sent: Thursday, March 21, 2002 11:08 AM To: Doyon, Jean-Francois; zope@zope.org Subject: Re: [Zope] System performance threads/proccesses & random crashes (SIGPIPE) SIGPIPE is raised by the OS when a UNIX pipe is broken in the application. UNIX takes this exception seriously which is why it sends the signal to the process telling it "you've got a broken pipe". As you say it started happening when you began using the database adapter, it may be that some piece of the database adapter opens a pipe that is later broken (for whatever reason, that's the $10,000 question ;-), causing the OS to send Zope a SIGPIPE. It may be possible to install a signal handler for SIGPIPE to get rid of the problem, but I'm not exactly sure what it should/would do during this failure state, and it would be more useful to try to pin down the pipe that is getting broken by making the problem replicable. The ZODB pool_size parameter is controlled via the pool_size argument to ZODB.DB.DB's constructor. It signifies how many database connections its willing to place in the pool. When Zope starts up, each Zope thread needs to use its own database connection. So you should likely never have a smaller pool_size than number of threads (the -t parameter to z2.py). Adjusting these values up and down may improve performance but there has to this day not been any empirical studies as to how performance is impacted when you do. It's probably something you need to try out in a load testing environment. If you find something interesting, let us know! ;-) ----- Original Message ----- From: "Doyon, Jean-Francois" <Jean-Francois.Doyon@CCRS.NRCan.gc.ca> To: <zope@zope.org> Sent: Thursday, March 21, 2002 9:57 AM Subject: [Zope] System performance threads/proccesses & random crashes (SIGPIPE) Hello, I'm running into random crashes of my zope processes, but I'm not finding any reference anywhere in the mailing list archives or on the site about this specific one: I'm getting: 2002-03-21T14:48:52 ERROR(200) zdaemon zdaemon: Thu Mar 21 09:48:52 2002: Aiieee! 20070 exited with error code: 13 Every now and then, for now apparent reason. signal 13 is a SIGPIPE ... This is Zope 2.5.0 with CMF 1.2 on a severly upgraded/updated/patched RH6.2 ... with a Python 2.1.2 built with defaults. It runs with FastCGI to Apache 1.3.2x ... Usually I just wait a couple of seconds, hit referesh in my browser and things come back to normal, but it's still annoying, and doesn't look good to the public. Note that when this happens, it ususally seems to happen to ALL processes. It looks to me like the PIPE's between the master zope process and it's children dies, and they all have to restart for some reason. Could this be ? and if so , why ? Note that I started noticing this when I for the first time started using Psycopg to create RDBMS connections to my PostgreSQL ... Could there be a relation somehow? On a slightly similar topic, How to I manage performance? I plan on using Zope for a fairly high demand web site .. I noticed I can control how many processes/threads start, but then I also read somethign about the ZODB pool_size ... What is the relation between the two exactly ? Thank you, Jean-François Doyon Internet Service Development and Systems Support GeoAccess Division Canadian Center for Remote Sensing Natural Resources Canada http://atlas.gc.ca Phone: (613) 992-4902 Fax: (613) 947-2410 _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev ) _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
A couple of more things:
This, so far as I can tell is a bug in the FastCGI implementation (Not handling SIGPIPE as suggested). Should I report it somewhere?
The bit you passed along from the FastCGI website seems to intimate that the behavior is expected... I'm not sure where you would report it. ;-)
Python installs a small number of signal handlers by default: SIGPIPE is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions)
I didn't know this. It appears Python already installs SIG_IGN as the signal handler for a SIGPIPE signal... mm. I'm not sure how your configuration manages to get around this. I'd have to guess that some product is resetting the signal handler.
And finally, how do I "ignore" a signal ? I guess just writing a "pass" will work ? I'll try it out, I guess on reception of a signal, only one handler is a called once?
import signal signal.signal(signal.SIGPIPE, signal.SIG_IGN) This installs a null signal handler for the SIGPIPE signal. I wonder if you could find the place in the code where the exception occurs when you click a lot and place a "print signal.getsignal(signal.SIGPIPE)" right before the place that the error happens. See if it's 1. If it's 1... well, I'm not sure. I dont know how the process could be killed by a SIGPIE then. If it's 0, that means that something has installed a signal handler on top of Python's default "ignore SIGPIPE" signal handler, and you might be able to find it by grepping for "SIGPIPE" or "13" in your code. - C
You know, something comes to mind. When a UNIX process forks, it copies its signal handler state to the child's process space. I dont know exactly how the FastCGI stuff invokes Zope, but it may be that Python only installs a SIGPIPE SIG_IGN handler if one isn't inherited from a parent process. Setting it explicitly in z2.py ala the below message might cause your problem to be solved if so. ----- Original Message ----- From: "Chris McDonough" <chrism@zope.com> To: "Doyon, Jean-Francois" <Jean-Francois.Doyon@CCRS.NRCan.gc.ca>; <zope@zope.org>; <matt@zope.com> Sent: Friday, March 22, 2002 4:41 PM Subject: Re: [Zope] System performance threads/proccesses & random crashes (SIGPIPE)
A couple of more things:
This, so far as I can tell is a bug in the FastCGI implementation (Not handling SIGPIPE as suggested). Should I report it somewhere?
The bit you passed along from the FastCGI website seems to intimate that the behavior is expected... I'm not sure where you would report it. ;-)
Python installs a small number of signal handlers by default: SIGPIPE is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions)
I didn't know this. It appears Python already installs SIG_IGN as the signal handler for a SIGPIPE signal... mm. I'm not sure how your configuration manages to get around this. I'd have to guess that some product is resetting the signal handler.
And finally, how do I "ignore" a signal ? I guess just writing a "pass" will work ? I'll try it out, I guess on reception of a signal, only one handler is a called once?
import signal signal.signal(signal.SIGPIPE, signal.SIG_IGN)
This installs a null signal handler for the SIGPIPE signal.
I wonder if you could find the place in the code where the exception occurs when you click a lot and place a "print signal.getsignal(signal.SIGPIPE)" right before the place that the error happens. See if it's 1. If it's 1... well, I'm not sure. I dont know how the process could be killed by a SIGPIE then.
If it's 0, that means that something has installed a signal handler on top of Python's default "ignore SIGPIPE" signal handler, and you might be able to find it by grepping for "SIGPIPE" or "13" in your code.
- C
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
participants (2)
-
Chris McDonough -
Doyon, Jean-Francois