Hello, I add up to two earlier postings where a nonresponsive, hanging or seemmingly frozen ZEO client is described: http://mail.zope.org/pipermail/zope-dev/2003-March/019179.html http://mail.zope.org/pipermail/zope-dev/2003-February/018681.html Esp. the later one is an accurate description of the situation we experiece too: (..)All the zope processes were still running, CPU usage was low (almost nil for python), there was plenty of free physical memory & swap. Yet Zope was not responding to requests. A look at the access logs revealed that zope had not logged anything since the time we noticed the outage(..) (..)A restart seemed to fix everything, though one of the zeo servers went down again (same symptoms) about 20 minutes after starting. Restarted it again and both servers have been fine for hours now(..) Using the "debugspinngingzope' howto did not help for me - I could not get any usable information out of gdb. I can add some details on my setup, which hopefully helps analysing the problem. Setup (all on Debian linux 2.4.18, Zope2.6.1, ZEO2.0.2, DBTab 1.1): ZEO server, serving a FileStorage and a TemporaryStorage (to share SESSION objects between the clients). I also tested a setup where a regular FileStorage was used. 2 ClientStorage nodes, using DBTab to mount the served FileStorage as "/", and the TemporaryStorage as "/temp_folder". This nonresponsive behaviour was also seen with just one ZEO client. However, when this one node uses a default "temp_folder" as being created at Zope startup (so not mounted from the ZEO server, but a TemporaryFolder), I so far did *not* experience the nonresponsiveness. If I need to provide more details which might help, please ask! regards, jw -- Jan-Wijbrand Kolman jw@infrae.com
Jan-Wijbrand Kolman wrote at 2003-4-10 18:26 +0200:
I add up to two earlier postings where a nonresponsive, hanging or seemmingly frozen ZEO client is described: ... Using the "debugspinngingzope' howto did not help for me - I could not get any usable information out of gdb.
What did it tell you? Where have the processes been waiting? You will need a very recent GDB (5.2 or newer) to debug multi-threaded processes with GDB. I would really like when we were able to understand this behaviour.... Dieter
Dieter Maurer,
What did it tell you? Where have the processes been waiting?
You will need a very recent GDB (5.2 or newer) to debug multi-threaded processes with GDB.
I include the output of a gdb session at the end of this mail. It is not very informative (to me at least). It could very well be though, that it is me doing something wrong here. I don't have any experience we gdb apart from following this howto... Do you have more ways to get gdb more informative? Do you need more details?
I would really like when we were able to understand this behaviour....
Me too :-) regards, and thx for your reply! jw -- Jan-Wijbrand Kolman jw@infrae.com ~# gdb python2.1 GNU gdb 5.3-debian Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"...(no debugging symbols found)... (gdb) attach 19132 Attaching to program: /usr/bin/python2.1, process 19132 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. [New Thread 16384 (LWP 19104)] [New Thread 32769 (LWP 19105)] [New Thread 16386 (LWP 19106)] [New Thread 65539 (LWP 19132)] [New Thread 81924 (LWP 19133)] [New Thread 98309 (LWP 19134)] [New Thread 114694 (LWP 19135)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 <snip lotsa output> (gdb) info threads 7 Thread 114694 (LWP 19135) 0x40151b1e in select () from /lib/libc.so.6 6 Thread 98309 (LWP 19134) 0x40151b1e in select () from /lib/libc.so.6 5 Thread 81924 (LWP 19133) 0x40151b1e in select () from /lib/libc.so.6 4 Thread 65539 (LWP 19132) 0x40151b1e in select () from /lib/libc.so.6 3 Thread 16386 (LWP 19106) 0x40151b1e in select () from /lib/libc.so.6 2 Thread 32769 (LWP 19105) 0x401502c0 in poll () from /lib/libc.so.6 1 Thread 16384 (LWP 19104) 0x40151b1e in select () from /lib/libc.so.6 (gdb) thread 2 [Switching to thread 2 (Thread 32769 (LWP 19105))]#0 0x401502c0 in poll () from /lib/libc.so.6 (gdb) call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()") Program received signal SIGABRT, Aborted. 0x400baa51 in kill () from /lib/libc.so.6 The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (PyRun_SimpleString) will be abandoned. (gdb) info threads 7 Thread 114694 (LWP 19135) 0x40151b1e in select () from /lib/libc.so.6 6 Thread 98309 (LWP 19134) 0x40151b1e in select () from /lib/libc.so.6 5 Thread 81924 (LWP 19133) 0x40151b1e in select () from /lib/libc.so.6 4 Thread 65539 (LWP 19132) 0x40151b1e in select () from /lib/libc.so.6 3 Thread 16386 (LWP 19106) 0x40151b1e in select () from /lib/libc.so.6 * 2 Thread 32769 (LWP 19105) 0x400baa51 in kill () from /lib/libc.so.6 1 Thread 16384 (LWP 19104) 0x40151b1e in select () from /lib/libc.so.6 (gdb) thread 7 [Switching to thread 7 (Thread 114694 (LWP 19135))]#0 0x40151b1e in select () from /lib/libc.so.6 (gdb) call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()") Program received signal SIGABRT, Aborted. [Switching to Thread 32769 (LWP 19105)] 0x400baa51 in kill () from /lib/libc.so.6 The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (PyRun_SimpleString) will be abandoned. (gdb) Program received signal SIGSEGV, Segmentation fault. 0x400bba61 in abort () from /lib/libc.so.6 The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (PyRun_SimpleString) will be abandoned. (gdb) info threads 7 Thread 114694 (LWP 19135) 0x400baae2 in sigsuspend () from /lib/libc.so.6 6 Thread 98309 (LWP 19134) 0x40151b1e in select () from /lib/libc.so.6 5 Thread 81924 (LWP 19133) 0x40151b1e in select () from /lib/libc.so.6 4 Thread 65539 (LWP 19132) 0x40151b1e in select () from /lib/libc.so.6 3 Thread 16386 (LWP 19106) 0x40151b1e in select () from /lib/libc.so.6 * 2 Thread 32769 (LWP 19105) 0x400bba61 in abort () from /lib/libc.so.6 1 Thread 16384 (LWP 19104) 0x40151b1e in select () from /lib/libc.so.6 (gdb) info threads * 7 Thread 114694 (LWP 19135) 0x400baae2 in sigsuspend () from /lib/libc.so.6 6 Thread 98309 (LWP 19134) 0x40151b1e in select () from /lib/libc.so.6 5 Thread 81924 (LWP 19133) 0x40151b1e in select () from /lib/libc.so.6 4 Thread 65539 (LWP 19132) 0x40151b1e in select () from /lib/libc.so.6 3 Thread 16386 (LWP 19106) 0x40151b1e in select () from /lib/libc.so.6 2 Thread 32769 (LWP 19105) 0x400bba61 in abort () from /lib/libc.so.6 1 Thread 16384 (LWP 19104) 0x40151b1e in select () from /lib/libc.so.6
Jan-Wijbrand Kolman wrote at 2003-4-14 12:51 +0200:
... I include the output of a gdb session at the end of this mail.
It is not very informative
It is not so bad... We see that is is no locking problem.
Do you have more ways to get gdb more informative? Do you need more details?
I think you can call Python functions only in the thread that holds the Python interpreter lock. Otherwise, you have only GDB and low (C) level ways. GDBs most usefull command in this respect is "bt" (aka "backtrace"). It shows you the call history of the selected thread.
.... (gdb) info threads 7 Thread 114694 (LWP 19135) 0x40151b1e in select () from /lib/libc.so.6 6 Thread 98309 (LWP 19134) 0x40151b1e in select () from /lib/libc.so.6 5 Thread 81924 (LWP 19133) 0x40151b1e in select () from /lib/libc.so.6 4 Thread 65539 (LWP 19132) 0x40151b1e in select () from /lib/libc.so.6 3 Thread 16386 (LWP 19106) 0x40151b1e in select () from /lib/libc.so.6 2 Thread 32769 (LWP 19105) 0x401502c0 in poll () from /lib/libc.so.6 1 Thread 16384 (LWP 19104) 0x40151b1e in select () from /lib/libc.so.6
Thread 2 is waiting is "poll" (this is almost surely the ZServer main loop). All other threads are waiting is "select". You want to know what led to these selects and use the "bt" command for this. Almost surely, it is an external C extension. Then it will reveal itself. Dieter
participants (2)
-
Dieter Maurer -
Jan-Wijbrand Kolman