RE: [Zope] Zope 2.7.3 Memory Leaks

5 Dec 2004

      On Sat, 2004-12-04 at 23:34, Andy Yates wrote:
...
Note that I'm not trying to find a problem with Zope.  My plan was to
start with something very simple and then add more and more of the
things we do on our production site and see when it starts leaking.
However, even the simplest tests seem to leak memory.
This is a great place to start.

I wouldn't be surprised at all if the simplest access leaked memory.
It seems easy to create memory leaks with Zope code.    There
aren't any tests that specifically target leaks in the Zope test
base so basically any given Zope release may have terrible leaks
and we'd never know until it got reported to us or until someone
caught it by accident.

AFAIK, there has never been a Zope release that anyone has had the
time or energy to certify as "leak-free", if that's even possible.
...
Thanks for the explaination of how the transient gc works.  That is
exactly what I found when I looked in the source code and I can see all
of this happening when I turn on the debug output.  Sessions are created
and placed in buckets.  The active sessions move to the current bucket
and expired session and old empty buckets get removed.  Then there is a
1 in 6 chance they will be gc'ed on each request.  I see all this
happening but when the buckets are gc'ed the memory usage according to
top does not go down.
Gotcha.
...
I ran the test again that puts a 1k string in the session objects.  I
used ab to run the python script 3000 times with a concurrency of 10.
Python's memory went from 22Mb to 54Mb.  Transient objects went from 0
to about 900 and stayed steady during the test.  After the test the
trans obs dropped back down to 0 as they expired.
Since there are a lot of moving parts to this problem, it's important to
be deliberate about the methodology used to provoke it.

With that in mind, I'm not sure how you get 900 transient objects out of
3000 requests via ab unless you put the same "_ZopeId" in the query
string for 2100 of these requests or unless you had a very short TOC
timeout or if you had a max number of subobjects of 900.  If none of
these things is true, I can imagine some conflict errors causing a few
to drop out, but you should have ended up somewhere around 3000
transient objects at some point unless the test ran for longer than the
timeout period.
...
I continued to make occasional Zope requests to plain pages to make sure
gc was getting tickled.
You probably know this but GC is only potentially tickled when you
actually access a session object, FWIW; "plain page" access has no
effect unless it does something to the sessioning machinery.  The most
reliable way to get gc to run is to call the TOC's "housekeep" method a
few times (it statistically does nothing 5/6th of the time, as you
noted) within the "plain page" you're accessing.
...
After 8+ hours of this the memory was never
released.  Python was still using 54Mb.  If I continue to create session
objects the 54Mb is not reused by python it just continues to grow.
Does the refcount of the TransientObject class at that point support
your theory that somehow references to these TransientObjects stick
around past the time they're gc'ed?  If so, do you also see other
classes with a similar
or related refcount (Acqusition.ImplicitAcquisitionWrapper,
ZPublisher.HTTPRequest.HTTPRequest, perhaps)?

Leaking an HTTPRequest reference in Zope is bad news because the REQUEST
may hold on to references to many other objects.  Likewise for
acquisition wrappers.  It's a real nightmare to try to isolate what the
issue is when you see these classes refcounts growing without bound.

FWIW, I've seen these sorts of leaks before in the past (in even very
simple accesses) and shamefully my first reaction is to do a mental
"Colonel Klink" (I see *nuss*ink! ;-) because I find it so difficult and
time-consuming to isolate.  I have such a distaste for diagnosing these
problems that I wrote a product named AutoLance to automatically restart
Zope when memory usage goes above some watermark. :-(
...
...
Note that Python will usually not release memory back to the OS once
it
has been allocated for its own use, so a large concrete amount of
memory
consumed by Python may not even be an indicator of a leak!  See
http://mail.python.org/pipermail/python-dev/2004-October/049480.html
for
more info.  The only solid basis for determining leaks is refcounts,
and
it's tricky to figure out whether growing refcounts are leaks or if
they're just side effects of the ZODB cache holding on to old objects.
I tested that too and python 2.3.4 on RedHat 9 does not seem to behave
this way.  I wrote a script that creates 100Mb of random strings and
places them in a list.  Then it waits a few seconds and deletes the
list.  When I look at the process with top I see memory (RSS) go up by
about 100Mb and then a few seconds later it goes back down to just above
the level when the script started.  Then I put this in a loop.  I can
watch the memory usage just cycle up and down.  This is what I would
expect.
As I understand it Python has separate memory pools for different kinds
of objects.  The policy for managing memory deallocation is different
for each pool.

I'm no expert about it, I'm just deferring to what I read about it, and
what I read seems to indicate that releasing the last reference to an
object doesn't always immediately return the memory consumed by that
object to the OS.  You probably hit upon a case where it does with the
above test but I wouldn't assume it's indicative of all Python memory
deallocation behavior.

As a result, the way I think about it anyway, the only reliable way to
determine whether an app is leaking or not to be refcounts.  Concrete
memory usage during process lifetime is interesting as a rough indicator
of how an app is behaving memory-wise but AFAICT refcounts tell a much
more interesting story for debugging purposes.

FWIW, I'm willing to help try to isolate this problem if we can come up
with some sane hypothesis about what sort of resources are leaking and a
general "why" backed up by refcount data and repeatable tests. 
Otherwise I think I'd rather go chew on some tinfoil or burn myself with
a cigarette. ;-)

- C

RE: [Zope] Zope 2.7.3 Memory Leaks

Chris McDonough