Re: [Zope] Zope 2.7.3 Memory Leaks

2 Dec 2004

      On Thu, 2004-12-02 at 17:24, Andy Yates wrote:
...
Now it seems to leak memory (400+ Mb) to the point I have to restart
Zope every 3 days.
I've read that memory leaks are almost always caused by the programmer,
so I set up some very simple tests that did not involve existing code.
Great idea.
...
First I used ab to get a simple page template.  After running 1000s of
request the memory had climbed slowly and never went down.
A possible leak but not necessarily a smoking gun.  You'd need to
examine the references created by each request.
...
Next I created an 8Mb page template.  This caused the memory usage to
climb much faster.  After the test the memory never goes back down.
Do these templates access the sessioning machinery?
...
I installed LeakFinder and started running more tests.
I used ab to get a python script that stored a 1k string in the session
object.  The transient object container timeout was set to 1 minute.
This also caused Zope to consume memory.  LeakFinder said the
Products.Transience.TransientObject.TransientObject ref count grew with
each request and never went down.  Everything else seemed to level off.
The debug output of Transience.py showed that the "buckets" seemed to be
getting deleted as expected, but the memory usage never goes down.  1
minute after the test stopped the transient object container showed that
there were no more items in the container.  It seems like when Zope
deletes expired sessions but contents of the sessions are not deleted.
This is actually probably normal.

TransientObject buckets hang around for some period of time before they
are "garbage collected" (deleted from their container).  A particular
TransientObject isn't garbage collected immediately when it expires, but
gets gc'ed much later along with other TransientObjects that were
created around the same time as a side effect of otherwise exercising
the sessioning machinery.  The algorithm for determining when gc will
happen is:

gc_every = period * round(SPARE_BUCKETS / 2.0)

where

gc_every is a number of seconds
period is 60 * whatever you've got your TOC 
  "data object timeout value (in minutes)" set for
SPARE_BUCKETS is 15 (by default)

... this resolves to in the default configuration:

gc_every = (20 * 60) * round(15 / 2.0)
gc_every = 9600

So garbage collection is attempted, at most, every 9600 seconds, which
is 160 minutes with the default sessioning configuration.  You would
only see the refcount for the TransientObject class drop after this
runtime period, and only if the sessioning machinery is invoked at least
once after these 9600 seconds transpire.

To complicate matters, there is a bit of randomness to when gc actually
gets invoked in order to reduce the chance of ZODB conflict errors.  But
if you keep doing session-related stuff, gc will eventually get run.

Note that if you set the TOC timeout to 0, gc *never* gets run (because
the lifetime of sessions is effectively infinite).

Also, FYI, the number of TransientObjects that can be created by the
session machinery is limited to the "maximum number of subobjects" value
on the TOC ZMI screen to allow a choke point for DOS attacks.  As a
result, your test code may not be doing what you think it's doing after
this number of objects has been created within the TOC.  You should
check the error log or bump that number up to something insane (it's
defaulted to 1000).

With this in mind, I think you should concentrate on trying to find
leaks in code that doesn't do sessions unless you've isolated the
problem to be session-related, because it's extremely time consuming to
test this.  We try to do a job of it in the TOC unit tests, FWIW, and
they pass.

Note that Python will usually not release memory back to the OS once it
has been allocated for its own use, so a large concrete amount of memory
consumed by Python may not even be an indicator of a leak!  See
http://mail.python.org/pipermail/python-dev/2004-October/049480.html for
more info.  The only solid basis for determining leaks is refcounts, and
it's tricky to figure out whether growing refcounts are leaks or if
they're just side effects of the ZODB cache holding on to old objects. 

Memory leaks really, really suck.  Fixing them is typically a matter of
binary exclusion, where you come up with some hypothesis, and comment
out large chunks of code along the codepath to try to prove or disprove
that hypothesis, measuring refcounts along the way.  Once you've proved
your hypothesis, you need to try to actually fix the problem, which can
be quite difficult.  There's no silver bullet here, unfortunately.

- C

Re: [Zope] Zope 2.7.3 Memory Leaks

Chris McDonough