[Mario Lorenz]
we have spent most of the day tracking down obscure hangs of Zope (2.6.4rc1) under python2.1.3 on a RHEL3 machine.
Based on what you say next, it sure sounds like this isn't unique to 2.6.4rc1. Did the same code "work" under some previous release? The infinite loop appears to be an inherent property of this iteration of the cPickleCache design, and that's not new in 2.6.4rc1.
The problem seems to be a logic flaw somewhere related to the cPickleCache, when using a destructor in a Zope object that accesses itself.
In our case(shortened to the offending line):
def __del__(self): print "About to destroy: ", self.id
What seems to happen is that the "self.id" access causes the object to be cached again, causing scan_gc_items() to run in circles.
Based on eyeballing the C code, "the ring" is a list of objects in cache, ordered from least recently used to most recently used. scan_gc_items traverses this list once, from LRU to MRU, ghosting ghostifiable objects until the target number of non-ghost objects remaining is reached, or the entire list has been traversed. It looks like ghostifying your "self" triggers self.__del__(). Then the __del__ method unghostifies self, which has the side effect of moving self to the MRU end of the ring, which in turn means the list traversal will visit self *again*. When it does, same thing happens all over again, ad infinitum.
Any ideas on how to best fix this?
As the docs Chris pointed you at say, persistent objects shouldn't have __del__ methods. If the by-eyeball analysis above is correct, if a persistent object does have a __del__ method referencing an attribute of self, an infinite loop in scan_gc_items() is inevitable. So I only see 3 workarounds short of rewriting the C code: 1. Lose the __del__ method (recommended). 2. If you need a __del__ method (it's hard to imagine why, since it will get called whenever the object is ghostified, and has nothing to do with the object's actual lifetime), don't reference any persistent objects (and esp. not self) within it. 3. Recompile with MUCH_RING_CHECKING defined. Then scan_gc_items will give up after max(cache_size * 10, 10000) iterations instead of running forever.