[Zope-dev] more on the segfault saga

Matthew T. Kromer matt@zope.com
Thu, 14 Mar 2002 11:28:29 -0500


This is a multi-part message in MIME format.
--------------070807050407040704010302
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Leonardo Rochael Almeida wrote:

>On Wed, 2002-03-13 at 21:30, Matthew T. Kromer wrote:
>
>>On Wednesday, March 13, 2002, at 10:40 AM, Leonardo Rochael Almeida 
>>wrote:
>>
>>>What about patching Python to report the freed objects like you
>>>mentioned on IRC? Also, how about turning on some flags in
>>>gc.seg_debug()? Do you think we might be able to glance something by
>>>seeing what objects where logged as freed or by storing them in
>>>gc.garbage?
>>>
>
>setting gc.set_debug(gc.DEBUG_LEAK) floods your stderr in a way you can
>only believe by seeing it. And it didn't give me any clue. the last
>object freed was an instance method. Most everything running inside Zope
>is an instance method or another...
>

OK, I'm attaching a patch to Python's Modules/gcmodule.c which should 
set a trap for where the garbage collector trips over bad data; this 
will grab the bad data and send it to stderr so I can build a better trap.

This is ONLY step one in tracking this down.  You will have to rebuild 
Python to activate this patch; and all it basically is doing is setting 
a SIGSEGV handler; and setting up a small trace area for the GC to 
record data in to, so at the time the SIGSEGV comes in, it can print out 
what the last thing was the code was doing.

This is ONLY going to tell me that the GC tripped over something, but it 
WILL at least tell me what object it is scanning, that object's refcount 
(which I bet is zero, and forms the basis for a better trap) and the 
object's type and traverse pointers.

The traverse pointer should NOT be null.  If it is, then thats something 
wrong with gc being called for that type.

If you apply this patch, run Zope with a python with this patch applied 
with stderr saved to a file.  send me the file, and then you can revert 
to running zope w/o the patch.

When the patch triggers, it will exit Python immediately with exit code 
999 after it prints its information.


-- 
Matt Kromer
Zope Corporation  http://www.zope.com/ 



--------------070807050407040704010302
Content-Type: text/plain;
 name="gcmodule.patch.1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="gcmodule.patch.1"

--- Modules/gcmodule.c.orig	Thu Mar 14 10:35:21 2002
+++ Modules/gcmodule.c	Thu Mar 14 11:14:13 2002
@@ -22,6 +22,8 @@
 #include "Python.h"
 
 #ifdef WITH_CYCLE_GC
+#include <signal.h>
+#include <stdarg.h>
 
 /* magic gc_refs value */
 #define GC_MOVED -1
@@ -34,6 +36,7 @@
 static PyGC_Head generation2 = {&generation2, &generation2, 0};
 static int generation = 0; /* current generation being collected */
 
+
 /* collection frequencies, XXX tune these */
 static int enabled = 1; /* automatic collection enabled? */
 static int threshold0 = 700; /* net new containers before collection */
@@ -60,12 +63,82 @@
 				DEBUG_SAVEALL
 static int debug;
 
+
+static int CRASHTRAP = 0;
+static int CRASHFLAG = 0;
+static char *CRASHTYPE = NULL;
+static int CRASHLOG[16];
+
 /* list of uncollectable objects */
 static PyObject *garbage;
 
 /* Python string to use if unhandled exception occurs */
 static PyObject *gc_str;
 
+static void CRASH_trip(int i, siginfo_t *siginfo, void *p) {
+
+	int n;
+
+	fprintf(stderr,"CRASH %d at %08x\n", (int) siginfo->si_signo,
+		(unsigned int) siginfo->si_addr);
+
+	if (CRASHFLAG == 0) {
+		fprintf(stderr,"\tCrash handler not activated for this!\n");
+	} else {
+		fprintf(stderr,"\tCrash type %s\n", CRASHTYPE ? CRASHTYPE : "(none)");
+		fprintf(stderr,"\tCrash log: %d values: ", CRASHLOG[0]);
+		for (n = 0; n < CRASHLOG[0]; n++) {
+			fprintf(stderr," %08x", (unsigned int) CRASHLOG[n+1]);
+		}
+		fprintf(stderr,"\n");
+	}
+	exit(999);
+}
+
+static void CRASH_activate(void) {
+
+	struct sigaction sa;
+	struct sigaction oldsa;
+
+	sa.sa_sigaction = CRASH_trip;
+	sigemptyset(&sa.sa_mask);
+	sa.sa_flags = SA_SIGINFO;
+
+	if (CRASHTRAP == 0) {
+		sigaction(SIGSEGV, &sa, &oldsa); 
+		CRASHTRAP = 1;
+	}
+
+	CRASHFLAG = 1;
+	CRASHTYPE = NULL;
+	CRASHLOG[0] = 0;
+
+}
+
+static void CRASH_deactivate(void) {
+	CRASHFLAG = 0;
+}
+
+static void CRASH_type(char *s) {
+	CRASHTYPE = s;
+}
+
+static void CRASH_record(int n, ...) {
+	va_list ap;
+	int i;
+
+	va_start(ap, n);
+
+	for (i = 0; i < n; i++) {
+		CRASHLOG[i+1] = va_arg(ap, int);
+	}
+
+	va_end(ap);
+
+	CRASHLOG[0] = n;
+}
+
+
 /*** list functions ***/
 
 static void
@@ -164,13 +237,29 @@
 subtract_refs(PyGC_Head *containers)
 {
 	traverseproc traverse;
+	PyObject *obj;
+
+
 	PyGC_Head *gc = containers->gc_next;
+
+	CRASH_activate();
+	CRASH_type("subtract_refs");
+
 	for (; gc != containers; gc=gc->gc_next) {
+		obj = (PyObject *)PyObject_FROM_GC(gc);
+		CRASH_record(4, obj,
+			obj != 0 ? obj->ob_refcnt : 0,
+			obj != NULL ? obj->ob_type : NULL,
+			obj != NULL && obj->ob_type != NULL ?
+				obj->ob_type->tp_traverse : NULL
+		);
 		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
 		(void) traverse(PyObject_FROM_GC(gc),
 			       (visitproc)visit_decref,
 			       NULL);
 	}
+
+	CRASH_deactivate();
 }
 
 /* Append objects with gc_refs > 0 to roots list */

--------------070807050407040704010302--