We have been having intermittent problems with zope hanging in both production and development. For what it's worth, it seems as if this problem started to occur since we upgraded to zope 2.7.3 (python 2.3.4) and Apelib 1.0. Previously, we were using a standard zope data.fs. We converted to Apelib for other reasons and now are starting to see these intermittent lockups. When it locks up, it is completely locked and has to be killed with a kill -9. Here is the pertinant output from ps -Aflwm. 0 - 501 2838 20340 0 - - - 88939 - 11:42 pts/5 00:00:06 ./python ../lib/python/Zope/Startup/run.py -C ../etc/zope.conf 0 S 501 - - 0 76 0 - - futex_ 11:42 - 00:00:06 - 1 S 501 - - 0 76 0 - - futex_ 11:43 - 00:00:00 - Zope is not spinning, it's just locked in a futex. I have tried to do use the debugging a spinning zope howto. (http://www.zope.org/Members/4am/debugspinningzope), but that doesn't work. I am assuming it is because the python binary has been stripped. Are there any other tricks to find where or what is locking the threads? Even though I'm running in debug mode, killing does not produce a traceback. Is there any way to force threads to dump a traceback on exit? -Chris --- debugging a spinning zope attempt # cat Z2.pid 2838 # gdb ../bin/python (gdb) attach 2838 ...snip (gdb) info threads 26 Thread 1122839472 (LWP 2845) 0xffffe410 in ?? () 25 Thread 1131232176 (LWP 2846) 0xffffe410 in ?? () 24 Thread 1139624880 (LWP 2847) 0xffffe410 in ?? () 23 Thread 1148017584 (LWP 2848) 0xffffe410 in ?? () 22 Thread 1156410288 (LWP 2849) 0xffffe410 in ?? () 21 Thread 1164802992 (LWP 2850) 0xffffe410 in ?? () 20 Thread 1173195696 (LWP 2851) 0xffffe410 in ?? () 19 Thread 1181588400 (LWP 2852) 0xffffe410 in ?? () 18 Thread 1189981104 (LWP 2853) 0xffffe410 in ?? () 17 Thread 1198373808 (LWP 2854) 0xffffe410 in ?? () 16 Thread 1206766512 (LWP 2855) 0xffffe410 in ?? () 15 Thread 1215159216 (LWP 2856) 0xffffe410 in ?? () 14 Thread 1223551920 (LWP 2857) 0xffffe410 in ?? () 13 Thread 1231944624 (LWP 2858) 0xffffe410 in ?? () 12 Thread 1240337328 (LWP 2859) 0xffffe410 in ?? () 11 Thread 1248730032 (LWP 2860) 0xffffe410 in ?? () 10 Thread 1257122736 (LWP 2861) 0xffffe410 in ?? () 9 Thread 1265515440 (LWP 2862) 0xffffe410 in ?? () 8 Thread 1273908144 (LWP 2863) 0xffffe410 in ?? () 7 Thread 1282300848 (LWP 2864) 0xffffe410 in ?? () 6 Thread 1290693552 (LWP 2865) 0xffffe410 in ?? () 5 Thread 1299086256 (LWP 2866) 0xffffe410 in ?? () 4 Thread 1307478960 (LWP 2867) 0xffffe410 in ?? () 3 Thread 1315871664 (LWP 2868) 0xffffe410 in ?? () 2 Thread 1324264368 (LWP 2869) 0xffffe410 in ?? () 1 Thread 1077192064 (LWP 2838) 0xffffe410 in ?? () (gdb) thread 1 [Switching to thread 1 (Thread 1077192064 (LWP 2838))]#0 0xffffe410 in ?? () (gdb) call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()") No symbol "PyRun_SimpleString" in current context.
Chris Kratz wrote:
We have been having intermittent problems with zope hanging in both production and development. For what it's worth, it seems as if this problem started to occur since we upgraded to zope 2.7.3 (python 2.3.4) and Apelib 1.0.
Where are you using apelib to store your data?
Are there any other tricks to find where or what is locking the threads? Even though I'm running in debug mode, killing does not produce a traceback. Is there any way to force threads to dump a traceback on exit?
This might be of help: http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Hello Chris, On Friday 25 February 2005 05:25 am, Chris Withers wrote:
We have been having intermittent problems with zope hanging in both production and development. For what it's worth, it seems as if this problem started to occur since we upgraded to zope 2.7.3 (python 2.3.4) and Apelib 1.0.
Where are you using apelib to store your data?
I'm not exactly sure what your asking, but we are using the file system mapping. Here is the relevant portion of our zope.conf %import Products.Ape <ape-db main> <ape-storage> mapper-variation filesystem <ape-fs-connection fs> basepath $INSTANCE/var/fs hidden-filenames _|\.svn|.+~|.+# </ape-fs-connection> </ape-storage> mount-point / scan-interval 2 cache-size 10000 </ape-db>
Are there any other tricks to find where or what is locking the threads? Even though I'm running in debug mode, killing does not produce a traceback. Is there any way to force threads to dump a traceback on exit?
This might be of help:
http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger
cheers,
Chris
Thanks, I will try that. -Chris
Chris Kratz wrote:
Where are you using apelib to store your data?
Sorry, my question was unclear ;-)
I'm not exactly sure what your asking, but we are using the file system mapping. Here is the relevant portion of our zope.conf
%import Products.Ape <ape-db main> <ape-storage> mapper-variation filesystem
...the answer is "file system" :-) cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
On Tuesday 01 March 2005 11:28 am, Chris Withers wrote:
Chris Kratz wrote:
Where are you using apelib to store your data?
Sorry, my question was unclear ;-)
I'm not exactly sure what your asking, but we are using the file system mapping. Here is the relevant portion of our zope.conf
%import Products.Ape <ape-db main> <ape-storage> mapper-variation filesystem
...the answer is "file system" :-)
cheers,
Chris
Hey, no problem. Dieter gave a hint that Apelib may be a red herring and the real problem may be a threading bug in python. I'm waiting till I have some time to upgrade python to a newer version and do some testing to see if that helps before I push it out to our live system. I do wish I could find a way to make apelib faster. Because of the calls to checking file mtime's, it is noticably slower then normal zodb especially with large numbers of objects. Startup is downright painfull. But we are addicted to using subversion, and this has been the absolute best way to use subversion & zope to handle rapid code changes among several developers. I'm starting to toy with the thought of "compiling" an apelib based db into another fs based ZODB for normal use on our production servers by copying between them. That would certainly help with performance. Cheers, -Chris -- Chris Kratz Systems Analyst/Programmer
Chris Kratz wrote:
I do wish I could find a way to make apelib faster. Because of the calls to checking file mtime's, it is noticably slower then normal zodb especially with large numbers of objects. Startup is downright painfull. But we are addicted to using subversion, and this has been the absolute best way to use subversion & zope to handle rapid code changes among several developers.
Maybe you should look at the skins tool from CMF, which has been extracted by several people into standalone products... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Kratz wrote at 2005-2-24 16:03 -0500:
... When it locks up, it is completely locked and has to be killed with a kill -9.
This is probably caused by a Python bug (fixed in 2.4). Search the archive for posts from "Andrew Langmead" who analysed this bug and posted a Python patch to its bug tracker (the one now integrated in Python 2.4). -- Dieter
participants (3)
-
Chris Kratz -
Chris Withers -
Dieter Maurer