Jon Prettyman wrote:
I've been reading through code trying to figure out what is going on here, where this message might be coming from. My current train of thought is that the 11 exit code being seen is in z2.py is a result of sys.ZServerExitCode getting set somewhere and z2.py exiting with that code.
So I've been trying to find where code sets sys.ZServerExitCode and what I've found is in ZServer.HTTPResponse.ChannelPipe.close. In this routine, the value of self._shutdown is assigned to r which then gets assigned to sys.ZServerExitCode.
It looks like self._shutdown only gets assigned when ZServer.HTTPResponse.ChannelPipe.finish gets called and a response header contains an bobo-exception-type of exceptions.SystemExit.
So I'm guessing now that somewhere this exception is getting set but I can't seem to figure out why.
There is only one reason why I can think of, clicking on the Shutdown button. A quick grep shows sys.exit being called in a few places however, most specificly in xmllib and pyexpat. None of these calls set a value of 11 though. Are you using any XML? XML-RPC calls?
Am I completely off base here?
Perhaps, it's a good avenue to look down however. What I suspect is that something is happening (like a SIGSEGV) that is causing the OS to send a signal to the process whose default action is to kill the process, setting the error code to 11 for some as-yet mysterious reason (11 has *got* to be the clue however, I refuse to believe that it is arbitrary, note also that SIGSEGV is signal 11, coincidence?). A good exercise may be to run Zope in gdb and wait, when one of these events happens use gdb to inspect what's going on. I'm no gdb expert however, but I was under the idea that it could tell you when signals arrive (or perhaps stop the process on the arival of a signal...) I wrote a test script that forked a parent and child just like Zope. When I sent the child a SIGSEGV it returned an error code of 332. Maybe my experiment is flawed. Have you enough time to look that deeply? -Michel
-Jon
Michel Pelletier <michel@digicool.com> writes:
Hmm, a simple test script seems to indicate that sending a child process a signal 11 does not cause it to dump core. Of course, it also does not seem to cause it to return a status code of 11 either. This might not be a SIGSEGV, it might just be a coincidence that the return code of the (crashed) child is 11. (FYI, there are two processing going on here, a 'watcher' parent and a child, the parent prints the 'Aieee!' when the child dies.
Does anyone out there know what a python program returning 11 means?
-Michel
Jon Prettyman wrote:
Nope. No core file.
Aieeeee!!!! -Jon
Okay, this is what I'm getting consistently when my server crashes under moderate load:
------ 2000-03-13T18:53:02 INFO(0) GUF Successful authentication for user April (http://207.241.10.50/premium/acl_users) ------ 2000-03-13T19:01:40 INFO(0) GUF Successful authentication for user thomas (http://www.nationalmortgagenews.com/premium/acl_users) ------ 2000-03-13T19:02:24 INFO(0) GUF Successful authentication for user terrydpeters (http://207.241.10.50/premium/acl_users) ------ 2000-03-13T19:03:36 ERROR(200) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Aiieee! 17065 exited with error code: 11 ------ 2000-03-13T19:03:36 INFO(0) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Houston, we have forked ------ 2000-03-13T19:03:36 INFO(0) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Hi, I just forked off a kid: 17125 ------ 2000-03-13T19:03:36 INFO(0) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Houston, we have forked ------ 2000-03-13T19:03:58 PROBLEM(100) ZServer Cannot do reverse lookup ------ 2000-03-13T19:03:58 INFO(0) ZServer Medusa (V1.13) started at Mon Mar 13 13:03:58 2000 Hostname: 207.241.10.50 Port:80
------ 2000-03-13T19:03:58 INFO(0) ZServer FTP server started at Mon Mar 13 13:03:58 2000 Authorizer:None Hostname: magmar Port: 8021
From the debug log: B 144485000 2000-03-13T19:03:02 GET /nmn/images/marketplacebutton.gif I 144485000 2000-03-13T19:03:02 0 A 144485000 2000-03-13T19:03:02 304 182 E 144485000 2000-03-13T19:03:02 B 144484496 2000-03-13T19:03:09 GET /id.htm I 144484496 2000-03-13T19:03:09 0 A 144484496 2000-03-13T19:03:09 200 3740 E 144484496 2000-03-13T19:03:09 B 143155832 2000-03-13T19:03:10 GET /newsubsid.gif I 143155832 2000-03-13T19:03:10 0 A 143155832 2000-03-13T19:03:10 200 45513 E 143155832 2000-03-13T19:03:10 B 145239016 2000-03-13T19:03:14 GET /nmn/images/marketplacebutton.gif I 145239016 2000-03-13T19:03:14 0 A 145239016 2000-03-13T19:03:14 304 163 E 145239016 2000-03-13T19:03:14 B 146772968 2000-03-13T19:03:32 GET /nmn/images/afstitle.gif I 146772968 2000-03-13T19:03:32 0 A 146772968 2000-03-13T19:03:32 304 182 E 146772968 2000-03-13T19:03:32 B 139605816 2000-03-13T19:03:59 GET /nmn/images/marketplacebutton.gif I 139605816 2000-03-13T19:03:59 0 A 139605816 2000-03-13T19:04:01 304 163 E 139605816 2000-03-13T19:04:01 B 139494360 2000-03-13T19:04:07 POST /premium/acl_users/register I 139494360 2000-03-13T19:04:07 518 A 139494360 2000-03-13T19:04:10 200 8704 E 139494360 2000-03-13T19:04:10 B 139787784 2000-03-13T19:04:28 GET /nmn/images/marketplacebutton.gif I 139787784 2000-03-13T19:04:28 0 A 139787784 2000-03-13T19:04:28 304 182 E 139787784 2000-03-13T19:04:28
I'm running Linux 2.2.12-20, Zope 2.1.4, currently with only ZServer (and FTP server.) Things get even flakier when I am running with PCGI.
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )