[Zope-dev] Segfault and Deadlock
Willi Langenberger
wlang at wu-wien.ac.at
Sun May 2 11:10:02 EDT 2004
Hi Zope (and Python) experts!
There seems to be a problem when an external python module segfaults
during a zope request. The remaining worker threads are deadlocked.
I think this is the same problem as Dieter pointed out in his message
to zope-dev "[Problem] strange state after SIGSEGV":
http://mail.zope.org/pipermail/zope-dev/2004-March/022092.html
The reason is the way python handles threads on some systems
(RedHat-7.3, kernel 2.4.20, without NPTL). I've written a small python
extension, which does nothing but segfault[1]. With this, i made the
following simulation, where one thread acquires a lock and segfaults:
#!/usr/bin/env python2.3
import thread
import time
import _segfault
_lock = thread.allocate_lock()
def worker():
time.sleep(10)
_lock.acquire()
_segfault.segfault()
_lock.release()
thread.start_new_thread(worker, ())
thread.start_new_thread(worker, ())
thread.start_new_thread(worker, ())
thread.start_new_thread(worker, ())
time.sleep(3600)
print 'Bye...'
On my RedHat-7.3 box (kernel 2.4.20-18, without NPTL) i get the
following behaviour. After starting the program, pstree shows this:
bash(4103,wlang)---python2.3(4333)---python2.3(4334)-+-python2.3(4335)
|-python2.3(4336)
|-python2.3(4337)
`-python2.3(4338)
After the 10 seconds sleep, one worker gets the lock, and
segfaults. After that, pstree shows this:
init(1)-+-[...]
|-python2.3(4336,wlang)
|-python2.3(4337,wlang)
|-python2.3(4338,wlang)
Three remaining worker threads (without main thread).
Gdb shows, that they wait for the lock (but they wont get it):
(gdb) info stack
#0 0x420293d5 in sigsuspend () from /lib/i686/libc.so.6
#1 0x40031609 in __pthread_wait_for_restart_signal ()
from /lib/i686/libpthread.so.0
#2 0x4003272c in sem_wait@@GLIBC_2.1 () from /lib/i686/libpthread.so.0
#3 0x080c7b2d in PyThread_acquire_lock (lock=0x8170728, waitflag=1)
^^^^^^^^^^^^^^^^^^^^^
at Python/thread_pthread.h:406
[...]
(On a side note, as python threads block all signals, these worker
threads cannot be stopped with SIGTERM. They must be killed with SIGKILL.)
All this has the consequences Dieter described:
> Consequences:
>
> * Zope did no longer respond to requests
>
> * "stop" did not work (as "SIGTERM" was ineffective)
>
> * "start" did not work, as the dangling processes kept
> the HTTP port bound.
So i think i know what's happening, but i don't know how to fix it!
Can anyone help please? Any hints are highly appreciated!
\wlang{}
PS: A RedHat-9 system (kernel 2.4.20, with NPTL) shows a different
behaviour. After the segfault, all threads disappeared. So maybe
all is ok with NPTL, but i've not tested it yet...
[1] segfault module
-segfault.c---------------
void
segfault(void)
{
char *x = 0;
*x = 'a';
}
-segfault.i----------------
%module segfault
%{
%}
void segfault(void);
-building:------------------
$ swig -python segfault.i
$ gcc -I/usr/local/include/python2.3 -c segfault_wrap.c -o segfault_wrap23.o
$ gcc -c -o segfault.o segfault.c
$ gcc -shared segfault_wrap23.o segfault.o -o _segfault.so
--
Willi.Langenberger at wu-wien.ac.at Fax: +43/1/31336/9207
Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria
More information about the Zope-Dev
mailing list