[Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

Matthew T. Kromer matt@zope.com
Tue, 11 Dec 2001 09:02:16 -0500


Hi Joe,

The problem you're seeing is that the fault is happening on a different
thread than the receiver of the signal; that truss syntax is interesting
though (I have an old SPARC around to test on but its painfully slow) so I'm
wondering if first you needed to do an 'info thread' in gdb and then a
'thread N' to switch to the real crashing thread before getting the
backtrace.


----- Original Message -----
From: "Joseph Wayne Norton" <norton@alum.mit.edu>
To: <Zope-Dev@zope.org>
Sent: Tuesday, December 11, 2001 2:20 AM
Subject: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on
solaris


>
> Hello.
>
> We are facing zope restarts on the solaris 5.6 platform with zope
> 2.4.3 and python 2.1.1.  I put together a script based some
> information on an old posting to the apache mailing list.  The
> following shell/perl script allows one to get a core file from a dying
> zope child process and also allow the zope to restart without any side
> effects.
>
>
> The script ....
>
> #!/bin/sh
> PATH=$PATH:/usr/local/bin
> export PATH
> cd /tmp
> for PID in `ps -u zfs -f -o pid,comm,args | fgrep z2.py | cut -d' ' -f1`
> do
>     export PID
>     truss -f -l -t\!all -S SIGSEGV,SIGILL -p $PID 2>&1 \
>         | perl -pe 'system("gcore $ENV{'PID'} && sleep 5 && kill -9
$ENV{'PID'}"), exit($ENV{'PID'}) if /(SIGSEGV|SIGILL)/;' &
> done
>
>
> Step 1:  modify script to match your environment.
>
> Step 2: execute script
>
> Step 3: wait for core file to be dumped in /tmp.
>
> Step 4: analyze with gdb where $PID is the pid of the dumped process
>
> #bash gdb /path/to/bin/python /tmp/core.$PID
>
> #0  0xef5b9810 in _lwp_sema_wait ()
> (gdb) where
> #0  0xef5b9810 in _lwp_sema_wait ()
> #1  0xef647ea0 in _park ()
> #2  0xef647b84 in _swtch ()
> #3  0xef6468a4 in cond_wait ()
> #4  0xef6467c8 in _ti_pthread_cond_wait ()
> #5  0x50220 in PyThread_acquire_lock (lock=0xd9d878, waitflag=1)
>     at Python/thread_pthread.h:313
> #6  0x51f18 in lock_PyThread_acquire_lock (self=0xda39b8, args=0x0)
>     at ./Modules/threadmodule.c:67
> #7  0x35db4 in fast_cfunction (func=0xda39b8, pp_stack=0xed40f828,
> na=0)
>     at Python/ceval.c:2994
> #8  0x33ca0 in eval_code2 (co=0x267848, globals=0x51ec4, locals=0x0,
> args=0x0,
>     argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
>     at Python/ceval.c:1951
>
>         :
>         :
>
>
> It seems that we are facing trouble due to the thread library on
> solaris (unless the truss command has introduced a side-effect).
>
> Anyone else facing similiar troubles?  .... or maybe I should post
> this to a python mailing list.
>
> - joe
>
>
>
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )
>