[Zope-dev] Very severe memory leak
Leonardo Rochael Almeida
leo at hiper.com.br
Sat Aug 23 20:06:16 EDT 2003
Hi,
For a long time now, one of our clients, running Zope 2.5.1, has been
experiencing memory leaks. For a time this has been relieved by
restarting Zope every day at 4am.
Lately this was not enough, however, as Zope started taking more and
more memory, to the point that it frequently required more than one
restart during the day.
The machine this is running on is a Xeon 500 MHz with 512 Mb of memory.
Zope is a very memory intensive application, but 512Mb used to be enough
for a medium sized portal.
This Zope site makes very heavy use of both ZClasses and ZCatalog, and
it appears to me that this "worsening" of perfomance and memory
consumption was being caused by the increase in content, which caused an
increase in the size of the ZCatalog and in the number of ZClass
instances. A total of 6796 ZClass instances representing News Items are
cataloged. There are other ZCatalogs and ZClass instances representing
calendar events and other stuff (this is a very old Zope portal coded by
hand in ZClasses, no CMF, no Plone, not even Pagina1, our ZClass based
CMS)
Fiddling with the cache parameters in the control panel showed that
while we could keep the memory consumption to a point where the daily
restart would be enough (say, with 10k objects per thread), we would get
constant thrashing of cache objects, specially DateTime objects due to
the ZCatalog queries, and the machine performance would be close to
intolerable, whereas if the cache parameters where set to allow a fast
performance (with 50k objects per thread) the machine would run out of
memory in 3 to 4 hours. Needless to say this was with heavy use of
RAMCacheManagers, not counting the accelerator proxy in front of it.
Without the RAMCaches, the machine would go down in under 5 minutes of
work hour load. Even with the caches on, the load would never go down
from 2.0 during work hours.
Last tuesday we decided no longer to wait for 2.6.2 and migrated the
site to 2.6.1. We dealt with the ObjectManager-based-ZClass issue,
reformed the ZCatalogs to replace the DateTime FieldIndexes with
DateTimeIndexes and then had a working testing environment, which we
stress tested lightly without detecting any problems and quickly moved
to production. This was late at night
The next morning we were surprised to notice the machine very quickly
ran out of memory. The memory leak was *far more severe* than before.
Zope needed a restart every 15m or so before it would send the machine
into heavy swaping.
In a very non-intuitive hunch I suggested we shut down all RAMCaches
and, amazingly enough, this made the situation a bit more manageable.
We're now restarting every 45 minutes. To our relief, disabling the
RAMCaches had only a barely noticeable effect on performance. The site
kept churning out pages really fast, a testatment to the optimization
job done in the 2.6 series. The load on the machine is rarely above 0.8,
except when it goes into swap :-)
The number of DateTime refcounts in the Control_Panel, although much
smaller than in Zope 2.5.1 is very high and, mostly importantly,
constantly increasing, as far as I can tell. After 12 minutes of uptime,
the top refcounts are:
DateTime.DateTime.DateTime: 96168
BTrees._IOBTree.IOBucket: 43085
BTrees._IIBTree.IIBTree: 40400
BTrees._IIBTree.IIBucket: 23696
OFS.DTMLDocument.DTMLDocument: 23190
BTrees.OIBTree.OIBucket: 14582
BTrees._IIBTree.IISet: 12479
BTrees._IIBTree.IITreeSet: 10823
BTrees.OOBTree.OOBucket: 7088
OFS.Image.Image: 6860
OFS.DTMLMethod.DTMLMethod: 5894
DocumentTemplate.DT_Util.Eval: 3250
OFS.Image.File: 2796
BTrees._IOBTree.IOBTree: 2761
ZClasses.Method.MWp: 1592
In time, DateTime refcounts eventually dwarves the second place by an
order of magnitude. I think this is related to the fact that DateTime
instances are stored as metadata, even though the date indexes have been
converted to DateTime indexes. The question is, why aren't those
instances being released? What is holding on to them?
I tried installing the LeakFinder product but discovered it didn't work
before stumbling in a message in the archives that told me exactly that
:-) The RefCounts view in the LeakFinder object fails with the following
traceback:
Traceback (innermost last):
[...]
Module DocumentTemplate.DT_Util, line 201, in eval
- __traceback_info__: REQUEST
Module <string>, line 0, in ?
Module Products.LeakFinder.LeakFinder, line 240, in manage_getSample
Module Products.LeakFinder.LeakFinder, line 163, in getControlledRefcounts
Module Products.LeakFinder.LeakFinder, line 188, in resetCache
TypeError: function takes at most 2 arguments (3 given)
The code in question is:
def resetCache(c, gc):
cache = c._cache
if gc:
cache.minimize(3) # The minimum that actually performs gc.
# Replace it with an empty cache.
c._cache = PickleCache(c, cache.cache_size, cache.cache_age)
PickleCache is imported from ZODB.cPickleCache, which is (surprise!) in
C and I couldn't figure out how I should fix the call above to work.
In an ExternalMethod, I consulted the python module gc and checked that
gc *was* enabled and *no* uncollectable cycles where found.
In trying to narrow down the causes of the leak, we detected what looks
like a minor one. If you insert a very simple dtml method in Zope root
and bang on it at a rate of, say, 100 requests per second, Zope will
increase it's memory footprint very slowly, say, a couple hundred Ks
every 10 minutes (both RSS and virtual size). I figure this has
something to do with memory fragmentation or some other low level stuff.
But this is definetly not our main concern, which is to find out how is
it that Zope is leaking DateTimes so heavily in our site, and if it is
really all those DateTimes that are hogging our memory. We need help,
and we need it desperately. If anyone wants any other information we'd
be happy to provide.
--
Leonardo Rochael Almeida <leo at hiper.com.br>
Hiperlogica informatica e educacao ltda.
More information about the Zope-Dev
mailing list