On Thu, 2004-12-02 at 17:24, Andy Yates wrote:
Now it seems to leak memory (400+ Mb) to the point I have to restart Zope every 3 days.
I've read that memory leaks are almost always caused by the programmer, so I set up some very simple tests that did not involve existing code.
Great idea.
First I used ab to get a simple page template. After running 1000s of request the memory had climbed slowly and never went down.
A possible leak but not necessarily a smoking gun. You'd need to examine the references created by each request.
Next I created an 8Mb page template. This caused the memory usage to climb much faster. After the test the memory never goes back down.
Do these templates access the sessioning machinery?
I installed LeakFinder and started running more tests.
I used ab to get a python script that stored a 1k string in the session object. The transient object container timeout was set to 1 minute. This also caused Zope to consume memory. LeakFinder said the Products.Transience.TransientObject.TransientObject ref count grew with each request and never went down. Everything else seemed to level off. The debug output of Transience.py showed that the "buckets" seemed to be getting deleted as expected, but the memory usage never goes down. 1 minute after the test stopped the transient object container showed that there were no more items in the container. It seems like when Zope deletes expired sessions but contents of the sessions are not deleted.
This is actually probably normal. TransientObject buckets hang around for some period of time before they are "garbage collected" (deleted from their container). A particular TransientObject isn't garbage collected immediately when it expires, but gets gc'ed much later along with other TransientObjects that were created around the same time as a side effect of otherwise exercising the sessioning machinery. The algorithm for determining when gc will happen is: gc_every = period * round(SPARE_BUCKETS / 2.0) where gc_every is a number of seconds period is 60 * whatever you've got your TOC "data object timeout value (in minutes)" set for SPARE_BUCKETS is 15 (by default) ... this resolves to in the default configuration: gc_every = (20 * 60) * round(15 / 2.0) gc_every = 9600 So garbage collection is attempted, at most, every 9600 seconds, which is 160 minutes with the default sessioning configuration. You would only see the refcount for the TransientObject class drop after this runtime period, and only if the sessioning machinery is invoked at least once after these 9600 seconds transpire. To complicate matters, there is a bit of randomness to when gc actually gets invoked in order to reduce the chance of ZODB conflict errors. But if you keep doing session-related stuff, gc will eventually get run. Note that if you set the TOC timeout to 0, gc *never* gets run (because the lifetime of sessions is effectively infinite). Also, FYI, the number of TransientObjects that can be created by the session machinery is limited to the "maximum number of subobjects" value on the TOC ZMI screen to allow a choke point for DOS attacks. As a result, your test code may not be doing what you think it's doing after this number of objects has been created within the TOC. You should check the error log or bump that number up to something insane (it's defaulted to 1000). With this in mind, I think you should concentrate on trying to find leaks in code that doesn't do sessions unless you've isolated the problem to be session-related, because it's extremely time consuming to test this. We try to do a job of it in the TOC unit tests, FWIW, and they pass. Note that Python will usually not release memory back to the OS once it has been allocated for its own use, so a large concrete amount of memory consumed by Python may not even be an indicator of a leak! See http://mail.python.org/pipermail/python-dev/2004-October/049480.html for more info. The only solid basis for determining leaks is refcounts, and it's tricky to figure out whether growing refcounts are leaks or if they're just side effects of the ZODB cache holding on to old objects. Memory leaks really, really suck. Fixing them is typically a matter of binary exclusion, where you come up with some hypothesis, and comment out large chunks of code along the codepath to try to prove or disprove that hypothesis, measuring refcounts along the way. Once you've proved your hypothesis, you need to try to actually fix the problem, which can be quite difficult. There's no silver bullet here, unfortunately. - C