[Zope] Large jumps of memory use increase, and seeking overall understanding of memory use

24 Feb 2007

      All,

(I am posting to both the general Zope and Plone lists, I hope that's
all right. My questions are more directly Zope issues, but in case
someone who only looks only at the Plone lists has a different insight
to offer...)

I'm the co-admin Alex Kirk was referring to in his recent post "Random
Crashes/Freezes on FreeBSD 5.4-RELEASE" on the Zope list. The
particular instance he was referring to was not a "random" crash but
simply Zope running out of memory (with the standard 512M limit on
FreeBSD, that we have not increased so far -- we have 2Gb of physical
memory). I have had one truly unexplained unresponsive Zope on that
system so far, but was not able to characterize it well enough to ask
an intelligent question on the list about that one.

I fully understand that Zope has to be restarted regularly and intend
to use monit to do so, but I would like to better understand what is
going on in any case, and hopefully be able to lower the required
frequency of such restarts. I have searched the lists (posts like
"[Zope-dev] Very severe memory leak" and "[ plone.user] Memory leaks
even with latest-everything minimalistic site?") and get the general
idea, but what I have read so far does not directly explain what I am
observing.

I have a test machine, but for starters, I have done some analysis of
memory consumption of our live site to see what normal traffic
produces. I have used a script to capture the size of the Zope process
every minute (from ps) and merged these results with the access log.
What I see does not make sense to me in terms of object cache, or
growth of refcounts. My only (entirely unverified) hypothesis is that
modules like PIL grab large chunks of memory once in a while. I
provide specific examples below.

First, the system:

Plone 2.5.1,
CMF-1.6.2,
Zope (Zope 2.9.5-final, python 2.4.3, freebsd5),
Five 1.3.7,
Python 2.4.3 (#2, Jul 2 2006, 16:14:54) [GCC 3.4.2 [FreeBSD] 20040728],
PIL 1.1.5
FreeBSD 5.4

My object cache size was initially at the default of 5000 objects per
threads (4 threads default). I reduced this to 2000 just to see, but
as I expected, the overall memory use behavior was the same.

I have made some minor customizations to Plone and added contributed
as well as my own products. My use of CMFReportTool might explain some
of what I see, but definitely not all.

Overall, my custom memory-use/access-log shows long periods with no
increase in memory at all, but there are some big jumps at times
(during a one minute tick) by 5, 10, even 25 Mb! As pretty much
expected, Zope never gives much memory back to the OS; that part I
understand, however I would like to get it to a state where it mostly
plateaus and then only slowly creeps up.

Some specific examples:

With memory use at 417640 Kb, no requests at all in the previous
minute, and a single request for an image_view_fullscreen in that
whole minute, memory use jumped by 5064 Kb to 422704. The image was
definitely not that large! (access log shows size returned was 1137).

While on that topic, I realize that the ATImage has to be loaded in
the object cache at some point, and that if it's a large image (in the
order of mega bytes), I might see the memory use jump by that much.
Now assume that the cache was already full when the image was loaded,
that a lot of other later requests fill the cache with much smaller
objects (so the image is not cached anymore) and then later the image
is requested again, my guess is that the cache should already have the
space it needs to load it, i.e. should I expect to see another memory
jump by the size of the image, and if so, why (where did the space it
originally occupy go that it can't be reused)? Can I easily check if
large objects are being split in smaller chunks as they should by the
way?

Other example: A few of the large jumps I noticed have as one of the
request during that minute a PDF listing of some documents. I use
CMFReportTool to generate the PDF. There seems to be a possible
correlation there, however, the resulting PDF document is about 150K,
and these memory jumps can be as large as 10 Mb! I'll have a look at
my code, CMFReportTool's and ReportLab's, but that seems a bit
excessive to produce that document, so I wonder if something else is
going on there. Of course, there is a catalog query required to get
the documents to list, but I generate the report with the metadata
only ( i.e. so nothing should be loaded in the object cache). I expect
some increase in memory the first time a module is loaded (like
ReportLab, although doesn't that happen when Zope is launched as part
of product initialization?), but that would not explain continued
increases on further requests. There are also many cases where the
same listing was requested with no increase in memory at all (although
that may be due to caching -- I'll have to look into that; I use
cacheFu but have turned Squid off for now).

Other example: There are however cases where I just can't see any
reason for the sudden jumps. In one case there was a ~15M increase
(from 477996 to 493016) during a one minute tick, with only a few very
benign requests (GETs for the sento_form, contact-info form, and a
couple of very small ATDocuments. There were no memory increases for
the two minutes preceding and following that occurrence.

Yet another: There was a 27Mb increase in one minute (the largest
during that log), nowhere close to a Zope restart (the memory use was
at 374376 prior to the jump). In that case there were a lot of
requests in the preceding minute, but they all look benign to me (a
few image thumbs, css, js, gifs; looks as if someone did a force
reload of their browser). Of course the images had to be loaded to
produce the thumbs, but I am sure that they total nowhere close to 27
Mb (they were all already pretty much thumb size to start with, except
our "image of the day", which might have been a couple of Mb). Is the
PIL process of generating the thumbs that memory hungry? I hope not.

I have also noticed a few instances where there are no requests for a
few minutes but still some growth of the Zope process (not by much
though). I assume that these are due to Zope internal housekeeping? I
do have an automated process that loads some content, but it runs at a
fixed time in the night and I have not seen large memory jumps while
that runs.

If someone feels like having a look, for fun ;-), that
memory-use/access-log file I have been looking at is a 6 Mb HTML file
(I removed some info from the access entries so they are easier to
scan through, and show the memory increases/decreases as a bar graph
so it's easy to scroll down the file and see big changes). The log
spans one Zope restart (I see a couple big and then more little memory
jumps soon after Zope has started, which makes sense to me).

Other tidbits.

I have seen one post stating that large file imports may increase
memory use by 2 to 3 times the size of the file. I have looked at some
POSTs of large images and see about that. Are these increases
cumulative, or should they plateau at some point? Also it wasn't clear
in that post if that was supposed to be a WebDav issue only (we are
not using WebDav so far).

When I flush the cache, the DateTime refcount goes down dramatically,
but the top counts (ValidationChain, EmptyValidator, AttributeStorage,
StringField) do not go down significantly.

Alex and I are volunteers doing this work for a non-profit, so any
ideas or "yep, that's normal, live with it" would be greatly
appreciated! (If the latter answer, I would still like to better
understand where that memory is being used; I still don't understand
how it could be the object cache, which is often cited as the main
cause of memory creep).

Cheers,

Jean

[Zope] Large jumps of memory use increase, and seeking overall understanding of memory use

Jean Lagarde