Re: [Zope] EAGAIN errors crashing ZServer Aiieeee!!!!
I'm pretty sure now that async_loop is not exiting in my child process so I'm guessing it is some other type of abort. I'll try the gdb avenue. I'm pretty handy with that, but I've never done much debugging python or threaded apps so.... I also grepped around the code and looked over every 11, sys.exit and SystemExit I could find. This is a real pain. -Jon Michel Pelletier <michel@digicool.com> writes:
Jon Prettyman wrote:
I've been reading through code trying to figure out what is going on here, where this message might be coming from. My current train of thought is that the 11 exit code being seen is in z2.py is a result of sys.ZServerExitCode getting set somewhere and z2.py exiting with that code.
So I've been trying to find where code sets sys.ZServerExitCode and what I've found is in ZServer.HTTPResponse.ChannelPipe.close. In this routine, the value of self._shutdown is assigned to r which then gets assigned to sys.ZServerExitCode.
It looks like self._shutdown only gets assigned when ZServer.HTTPResponse.ChannelPipe.finish gets called and a response header contains an bobo-exception-type of exceptions.SystemExit.
So I'm guessing now that somewhere this exception is getting set but I can't seem to figure out why.
There is only one reason why I can think of, clicking on the Shutdown button.
A quick grep shows sys.exit being called in a few places however, most specificly in xmllib and pyexpat. None of these calls set a value of 11 though.
Are you using any XML? XML-RPC calls?
Am I completely off base here?
Perhaps, it's a good avenue to look down however. What I suspect is that something is happening (like a SIGSEGV) that is causing the OS to send a signal to the process whose default action is to kill the process, setting the error code to 11 for some as-yet mysterious reason (11 has *got* to be the clue however, I refuse to believe that it is arbitrary, note also that SIGSEGV is signal 11, coincidence?). A good exercise may be to run Zope in gdb and wait, when one of these events happens use gdb to inspect what's going on. I'm no gdb expert however, but I was under the idea that it could tell you when signals arrive (or perhaps stop the process on the arival of a signal...)
I wrote a test script that forked a parent and child just like Zope. When I sent the child a SIGSEGV it returned an error code of 332. Maybe my experiment is flawed.
Have you enough time to look that deeply?
-Michel
-Jon
Michel Pelletier <michel@digicool.com> writes:
Hmm, a simple test script seems to indicate that sending a child process a signal 11 does not cause it to dump core. Of course, it also does not seem to cause it to return a status code of 11 either. This might not be a SIGSEGV, it might just be a coincidence that the return code of the (crashed) child is 11. (FYI, there are two processing going on here, a 'watcher' parent and a child, the parent prints the 'Aieee!' when the child dies.
Does anyone out there know what a python program returning 11 means?
-Michel
Jon Prettyman wrote:
Nope. No core file.
Aieeeee!!!! -Jon
participants (1)
-
Jon Prettyman