[Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris
Matthew T. Kromer
matt@zope.com
Wed, 12 Dec 2001 11:00:49 -0500
Florent Guillaume wrote:
>>(gdb) print *((PyObject *) gc)->ob_type
>>$1 = {ob_refcnt = 18213696, ob_type = 0x2d70b0, ob_size = 0,
>> tp_name = 0x1 "T", tp_basicsize = 1328272, tp_itemsize = 4156348,
>> tp_dealloc = 0x125865c, tp_print = 0x3c1b04, tp_getattr = 0,
>>tp_setattr = 0,
>> tp_compare = 0x29, tp_repr = 0x3adeb0, tp_as_number = 0xf66198,
>> tp_as_sequence = 0xdf3fa0, tp_as_mapping = 0x0, tp_hash = 0x1,
>> tp_call = 0x144490 <PyMethod_Type>, tp_str = 0x3f0a1c,
>> tp_getattro = 0x125865c, tp_setattro = 0x3c1b04, tp_as_buffer = 0x0,
>>
>> tp_flags = 158561192, tp_doc = 0x29 "", tp_traverse = 0x4c4f4144,
>> tp_clear = 0xd908c0, tp_richcompare = 0x1151300, tp_weaklistoffset =
>>0}
>>
>[...]
>
>>gdb) x 0x4c4f4144
>>0x4c4f4144: Cannot access memory at address 0x4c4f4144.
>>
>
>
>0x4c4f4144 is big-endian ascii for "LOAD". Things were corrupted
>before...
>
>
>Florent
>
Yes, the whole block is bad, so it probably isn't really a Python type
object. The refcount is a bit high, the name is really low (0x01!) the
basicsize and itemsize are extremely large, the compare function is too
low, the hash function is too low -- ie it isn't a type object.
So, I may have been telling him to get the wrong thing; the source code
that he faulted in reads:
/* Subtract internal references from gc_refs */
static void
subtract_refs(PyGC_Head *containers)
{
traverseproc traverse;
PyGC_Head *gc = containers->gc_next;
for (; gc != containers; gc=gc->gc_next) {
/* The next line is the line that was active at the time of his fault */
traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
(void) traverse(PyObject_FROM_GC(gc),
(visitproc)visit_decref,
NULL);
}
}
And PyObject_FROM_GC(gc) is either (gc) or ((PyObject *)(((PyGC_Head
*)gc)+1)) depending on on whether or not WITH_CYCLE_GC is defined. I
took the easy route and asked Joe to assume that the former was true.
If the latter is true, then the type object is shifted upwards in memory
by three words; the new first three fields are gc_next, gc_prev, and
gc_refs.
That means every value in the type header is off by three fields, if it
isn't aligned, meaning the real type object would be:
gc_next = 0x115eb40
gc_prev = 0x2d70b0
gc_refs = 0
ob_refcnt = 0x1
ob_type = 0x144490 (which we actually know is <PyMethod_Type> -- yay)
ob_size = 0x3f6bbc (which is too large for my comfort)
tp_name = 0x12865c (valid pointer but we dont know what it is)
tp_basicsize=0x3c1b04 (seems high again, but is 0x350b8 less than ob_size)
tp_itemsize = 0
tp_dealloc = 0
tp_print = 0x29 (boo!)
tp_getattr = 0x3adeb0
tp_setattr = 0xf66198
tp_compare = 0xdf3fa0
tp_repr = 0
tp_as_number = 1 (boo!)
tp_as_sequence = 0x144490 <PyMethod_Type> (boo!)
etc...
even shifting THESE values by 1 (assuming the compiler takes PyGC_Head
which is three words and pads it up to 4 words for alignment) puts
garbage values like 0x29 in tp_dealloc.
Ergo, I'm pretty confident that the gc pointer itself is bad.
If I was just a *wee* bit more familiar with how Solaris loaded
segments, I'd be able to glean some more information from the addresses
(ie are they code or data segment pointers). Normally I like seeing
OS's use the high nybble or byte of an address as a segment number to
make that sort of diagnosis easier.
It actually looks like page zero is MAPPED on Solaris (I didnt think it
was) which in my book is a baaad thing since it means a null pointer CAN
be dereferenced.