I've got a fairly major memory leak in my application. I've followed the thread from August including Shane's suggestions about using a debug build of Python to inspect object references and the rest [1] I know from the refcounts in Zope that items of class Foo are definitely leaking, yet when I do a sys.getobjects(0, Foo) I get nothing back. Navigating all 100000-ish references one by one seems a bit daunting. So I'm not sure where to go from here. Shane, you mentioned you had put together some useful functions for exploring this debug information - could you share them? Or were you just refering to the remote console you supplied earlier? Also, what kinds of Python code can cause memory leaks? I'm not really sure what I should be looking for. The only thing I can think of is hanging file descriptors, as circular references should be picked up by the gc anyway. Seb [1] http://mail.zope.org/pipermail/zope-dev/2003-August/020358.html
[Seb Bacon]
... I know from the refcounts in Zope that items of class Foo are definitely leaking, yet when I do a sys.getobjects(0, Foo) I get nothing back.
If Foo is an old-style class, then every instance of Foo has type InstanceType (and so does every instance of every other old-style class):
class Foo: pass
type(Foo()) <type 'instance'> import types types.InstanceType <type 'instance'> types.InstanceType is type(Foo()) True
getobjects() filters on type, so nothing will ever match Foo as a type. If you can change Foo to a new-style class (most easily by inheriting from object, in a recent-enough Python), life gets easier:
class Foo(object): pass
type(Foo()) <class '__main__.Foo'>
Then getobjects() can filter on Foo as a type. Classes and types before Python 2.2 are distinct concepts (barring Zope ExtensionClass complications).
Navigating all 100000-ish references one by one seems a bit daunting.
Na, with list comprehension syntax (for brevity -- you can do the same with a for-loop, of course), foos = [x for x in sys.getobjects(0) if isinstance(x, types.InstanceType) and x.__class__ is Foo] will extract just the Foo instances (if Foo is an old-style class).
So I'm not sure where to go from here.
Debugging memory leaks can be hard, in any language. Another place to look for ideas is in the top-level test.py from a current Zope HEAD checkout (or 2.7 branch, or Zope3). If you're able to run your code in a loop, the TrackRefs class in test.py automates some measure of identifying what (if anything) is leaking. We've changed many internal ZODB and ZEO classes to new-style classes primarily just so this test.py's -r option is more useful in identifying the source of leaks. Some yielded easily to analysis, others slobbered on for part-time weeks. There are two common culprits: 1. Some class keeps a list, or dict, of all instances ever created. These are obvious once found, but can be surprisingly hard to locate. Of course the instances never go away then until the class goes away. Sometimes it's due to leftover debugging code someone forgot to delete again. 2. "Reference cycles". Big topic <wink>.
Tim Peters wrote: <snip useful info about new-style classes>
Debugging memory leaks can be hard, in any language.
No kidding. I thought when I identified the suspect class two days ago I was nearly there ;-)
Another place to look for ideas is in the top-level test.py from a current Zope HEAD checkout (or 2.7 branch, or Zope3).
OK, will do.
2. "Reference cycles". Big topic <wink>.
Seeing as the suspect leaker contains code like: other = Foo() other.reciprocal = self self.reciprocal = other I fear the worst ;-) ...but my (naive?) reading of the documentation was that reference cycles are cleaned out by the garbage collector, *unless* they define a __del__ (which is not the case here). How am I wrong? Thanks, seb
[Seb Bacon]
... Seeing as the suspect leaker contains code like:
other = Foo() other.reciprocal = self self.reciprocal = other
I fear the worst ;-)
...but my (naive?) reading of the documentation was that reference cycles are cleaned out by the garbage collector, *unless* they define a __del__ (which is not the case here). How am I wrong?
You're reading the docs correctly. It's not necessarily cycles directly involving Foo objects that causes Foo objects to leak, it can be instead that some other (non-Foo) objects in cycles can't be collected, from which the Foo objects are in turn reachable. When an object O can't be collected, then neither can any object reachable from O. gc.get_referrers() can be used to find objects that refer to a given Foo instance. It's also possible that a something S refers to a Foo instance where S doesn't participate in cyclic gc. Then any cycle containing S is immortal, regardless of whether __del__ methods are defined in the cycle, and also then gc.get_referrers() can't reveal S's existence. Sometimes such an S is in the Python core, or in Zope's C code, although the more recent the release the less likely that is (more & more kinds of objects have been added to cyclic gc over time). Are you sure that *only* Foo objects are leaking? It's pretty rare, when there's a leak, to see only one kind of object leaking.
Tim Peters wrote:
[Seb Bacon]
...but my (naive?) reading of the documentation was that reference cycles are cleaned out by the garbage collector, *unless* they define a __del__ (which is not the case here). How am I wrong?
You're reading the docs correctly. It's not necessarily cycles directly involving Foo objects that causes Foo objects to leak, it can be instead that some other (non-Foo) objects in cycles can't be collected, from which the Foo objects are in turn reachable. When an object O can't be collected, then neither can any object reachable from O. gc.get_referrers() can be used to find objects that refer to a given Foo instance. It's also possible that a something S refers to a Foo instance where S doesn't participate in cyclic gc. Then any cycle containing S is immortal, regardless of whether __del__ methods are defined in the cycle, and also then gc.get_referrers() can't reveal S's existence. Sometimes such an S is in the Python core, or in Zope's C code, although the more recent the release the less likely that is (more & more kinds of objects have been added to cyclic gc over time). Are you sure that *only* Foo objects are leaking? It's pretty rare, when there's a leak, to see only one kind of object leaking.
You're right, there seem to be a few other things involved. I think Foo comes out top simply because it is the most numerous instance involved in the leak. So, say Foo is leaking because it is referenced from O which can't be collected. Given 100 things which refer to Foo, how do I identify which one is O? And of course, then O may be leaking because it is referenced from P... I sense this question is a bit like asking someone to explain how to solve a Rubik's Cube in 3 words. but FWIW, the kind of logic I'm using is: - run test case - notice that there are a lot of references to Foo - get an instance of Foo using sys.getobjects(0) - get referrers using gc.get_referrers(Foo) - run garbage collection using gc.collect()? - is Foo still there? Which of its referrers are still there? Incidentally, I've found some other bug. I can get Zope to segfault by calling PickleCache.minimize(3), if a Bar object has been loaded which defines a __del__ method thus: def __del__(self): print "deleting", self.getId() It couldn't be related, could it? (it's borking at a point where it frees memory) Cheers, Seb
Seb Bacon wrote:
So, say Foo is leaking because it is referenced from O which can't be collected. Given 100 things which refer to Foo, how do I identify which one is O? And of course, then O may be leaking because it is referenced from P...
I've been looking into memory leaks of my own, and put together a small module that outputs the information available from the gc.garbage list in a format that Graphviz (http://www.research.att.com/sw/tools/graphviz/) can then render as an image (GIF, SVG, etc) that makes it a bit easier to see the cycles. Here's a sample image: http://barryp.org/misc/simpletal_visualize/test2.gif Here's the code http://barryp.org/misc/simpletal_visualize/visualize_pyobjects.py It shouldn't be too hard to use that in your own code, something like: import visualize_pyobjects gc.collect() cycles = visualize_pyobjects.prune_stems(gc.garbage) visualize_pyobjects.create_dot(cycles, '/tmp/foo.dot') and then, run the graphviz 'dot' program from the command line to create a GIF file: dot -Tgif </tmp/foo.dot >/tmp/foo.gif (use a '-Tsvg' option to create an SVG file, which is smaller and easily viewed and edited with things like sodipodi I believe) Barry
Seb Bacon wrote:
So, say Foo is leaking because it is referenced from O which can't be collected. Given 100 things which refer to Foo, how do I identify which one is O? And of course, then O may be leaking because it is referenced from P...
I sense this question is a bit like asking someone to explain how to solve a Rubik's Cube in 3 words.
Well, I have come to some kind of resolution, though I am still slightly mystified. Here's the sequence of events, in case they are of any help to others (doubtful...). Although there probably is a memory leak in my application, the one I thought I was hunting wasn't what was causing my server collapse. At first, I noticed that memory usage was increasing linearly over time until the server expired. I examined reference counts for all the classes therein, mainly using Shane's LeakFinder product (I could have used the refcounts listing on the control panel, but I found the LeakFinder's reference count display tab nicer to use.) I noticed that references to a particular Foo class were increasing in direct proportion to the memory usage, apparently without bound. I also noted that Foos are involved in reference cycles. I guess from this that maybe Foos were leaking somehow - which was incorrect. There is nothing wrong with reference cycles *per se* (see earlier in this thread). Then I looked at Bars, which were referencing Foos. Given the way that Foos are implemented (with mutual references to each other) and the fact that there may be several Foos stuck on a Bar, then a leak in a Bar could have a big knock-on effect of creating a whole ton of Foos. The number of references to Bars was also increasing without bound. This went on for ages. Worth mentioning is Barry's cool reference visualisation tool (see earlier in this thread). I had already tried my application using Zope 2.6.2 (it was on 2.6.1 before) and noted reference counts also going up rapidly, so it wasn't that, I decided. To cut a long story to a medium length, it *was* that. When using 2.6.2, I noticed that if I forced garbage collection, the refcounts went down. Going over to the database connection caches, I noted that in Zope 2.6.1 the number of cache entries bore no relation to the target cache size. In Zope 2.6.2, it did. In other words, the way my application is implemented means that *lots* of references can accumulate in the space of a single request, and something about these references meant that they were never getting cleared out of the cache by Zope 2.6.1. The cache was the culprit. Moral: always keep on top of those Zope releases ;-) What's puzzling me is that I can't see anything that changed between 2.6.1 and 2.6.2 which might have fixed this behaviour. Seb
participants (3)
-
Barry Pederson -
Seb Bacon -
Tim Peters