Hi folks. I have a Zope 2.4 CVS checkout from a couple of days ago, which I'm running on a linux system on a Cobalt Raq. I've compiled it against Python 2.1.1 (compiled from source), and compiled a fresh DynPersist.so for it for use with ZPatterns. Every so often, the server restarts after crashing on a signal 11. The log entry looks something like this: 2001-08-15T18:26:03 ERROR(200) zdaemon zdaemon: Wed Aug 15 19:26:03 2001: Aiieee! 8736 exited with error code: 11 I can't reliably reproduce the problem, but it does keep happening. Seeing as it's a signal 11, I'm guessing that something in Python or a Zope C module isn't compiled right. Any pointers as to where to start to debug this? Thanks. -- Steve Alexander Software Engineer Cat-Box limited
Hey Steve... This is a tricky one to pin down; what I normally do is change the startup of Zope so that it can be debugged under gdb, e.g. $ gdb python gdb> run z2.py -Z '' -t 1 but that only runs one thread, so you may NEVER encounter the problem. You can experiment with running more than one thread under gdb. When gdb halts next, there's your segfault! If "where" shows a recursion depth of greater than 300 or so, there's an infinate loop. Alternately, you can run into problems if you try to run against a Python with a different storage allocator (and potentially even garbage collection, although I *think* gc is safe) since ExtensionClass is not aware of the changes to object construction/destruction that need to take place. Some people have reported (and I don't know if they've been fixed) that some of the C routines use "PyMem_Del*" methods to delete objects, when they should use "PyObject_Delete" to release storage. This mismatch causes grief when additional pointer manipulation is happening by the storage allocator and/or garbage collection. Steve Alexander wrote:
Hi folks.
I have a Zope 2.4 CVS checkout from a couple of days ago, which I'm running on a linux system on a Cobalt Raq.
I've compiled it against Python 2.1.1 (compiled from source), and compiled a fresh DynPersist.so for it for use with ZPatterns.
Every so often, the server restarts after crashing on a signal 11.
The log entry looks something like this:
2001-08-15T18:26:03 ERROR(200) zdaemon zdaemon: Wed Aug 15 19:26:03 2001: Aiieee! 8736 exited with error code: 11
I can't reliably reproduce the problem, but it does keep happening.
Seeing as it's a signal 11, I'm guessing that something in Python or a Zope C module isn't compiled right.
Any pointers as to where to start to debug this?
Thanks.
-- Steve Alexander Software Engineer Cat-Box limited
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Steve Alexander writes:
I have a Zope 2.4 CVS checkout from a couple of days ago, which I'm running on a linux system on a Cobalt Raq.
I've compiled it against Python 2.1.1 (compiled from source), and compiled a fresh DynPersist.so for it for use with ZPatterns.
Every so often, the server restarts after crashing on a signal 11.
The log entry looks something like this:
2001-08-15T18:26:03 ERROR(200) zdaemon zdaemon: Wed Aug 15 19:26:03 2001: Aiieee! 8736 exited with error code: 11 You should be able to get a "core" dump. Then, you would use a debugger to look into the "core". With some luck, the problem is local and you can immediately determine the culprit.
If bad cases, memory management has been hit. This is a non-local problem, very difficult to analyse because the problem occured much earlier... A task for "purify" or similar tools... I got today a signal 11 because I had used "getattr". I am almost sure, this was caused by a runtime stack overflow. Usually, Unix automatically extends the stack, but in a multi-threaded program, this may not be possible -> Signal 11. Writing core dumps is often disabled. With a bash, you can enable it with: ulimit -c 20000 This would allow writing core files up to about 20 MB. Dieter
Folks, I have seen this too, and apparently it is not just Zope. Someone wrote to me awhile back saying the MySQLdb-0.9.0 would dump core with Python-2.1.1 when compiled with the new py_malloc code turned on. I was able to reproduce this but not isolate the problem, except that it occurred during PyObject_GetItem(). Well, I have had some mysterious Zope restarts as well (which I had attributed in part to the problem above, as I am using MySQLdb/ZMySQLDA) and finally decided to track it down. Sure enough, it dies in PyObject_GetItem() (in this case, called from ExtensionClass). So one thing to try is turning off py_malloc, if it is on, and see if that cures the problem. Either there is bug in Python, or something is wrong with our respective classes (though the MySQLdb one was triggered by pure python code). -- Andy Dustman PGP: 0xC72F3F1D @ .net http://dustman.net/andy I'll give spammers one bite of the apple, but they'll have to guess which bite has the razor blade in it.
Andy Dustman wrote:
Folks, I have seen this too, and apparently it is not just Zope. Someone wrote to me awhile back saying the MySQLdb-0.9.0 would dump core with Python-2.1.1 when compiled with the new py_malloc code turned on. I was able to reproduce this but not isolate the problem, except that it occurred during PyObject_GetItem().
Well, I recompiled Python 2.1.1 with py_malloc turned off. Then I recompiled Zope and ZPatterns. (I don't know whether I really needed to recompile Zope and ZPatterns.) Result: no signal 11 crashes in 36 hours. I was getting one at least every couple of hours before, often more frequently, depending on the system load. Zope is now back to it usual stable self. Thanks for the information, Andy. Also thanks to everyone else who offered advice on this. -- Steve Alexander Software Engineer Cat-Box limited
Andy Dustman wrote:
Folks, I have seen this too, and apparently it is not just Zope. Someone wrote to me awhile back saying the MySQLdb-0.9.0 would dump core with Python-2.1.1 when compiled with the new py_malloc code turned on. I was able to reproduce this but not isolate the problem, except that it occurred during PyObject_GetItem().
Well, I recompiled Python 2.1.1 with py_malloc turned off. Then I recompiled Zope and ZPatterns. (I don't know whether I really needed to recompile Zope and ZPatterns.)
Result: no signal 11 crashes in 36 hours. I was getting one at least every couple of hours before, often more frequently, depending on the system load. Zope is now back to it usual stable self.
Sounds like one for the Python bug tracker... Chris
* Dieter Maurer <dieter@handshake.de> [010816 21:13]:
Steve Alexander writes:
I have a Zope 2.4 CVS checkout from a couple of days ago, which I'm running on a linux system on a Cobalt Raq.
I've compiled it against Python 2.1.1 (compiled from source), and compiled a fresh DynPersist.so for it for use with ZPatterns.
Every so often, the server restarts after crashing on a signal 11.
The log entry looks something like this:
2001-08-15T18:26:03 ERROR(200) zdaemon zdaemon: Wed Aug 15 19:26:03 2001: Aiieee! 8736 exited with error code: 11 You should be able to get a "core" dump. Then, you would use a debugger to look into the "core". With some luck, the problem is local and you can immediately determine the culprit.
I'm getting occasional signal 11s too (completely different set-up - stock 2.3.2 on an up-to-date RedHat 6.1) However, I'm not getting any core dumps, even though my ulimit is set to 'unlimited'. Any suggestions on how to make Zope do the dump? seb
seb bacon writes:
I'm getting occasional signal 11s too (completely different set-up - stock 2.3.2 on an up-to-date RedHat 6.1) However, I'm not getting any core dumps, even though my ulimit is set to 'unlimited'. Any suggestions on how to make Zope do the dump? If "ulimit" returns "unlimited", it means "unlimited time". Use "ulimit -a" to be told the complete truth...
Dieter
participants (6)
-
Andy Dustman -
Chris Withers -
Dieter Maurer -
Matthew T. Kromer -
seb bacon -
Steve Alexander