[ZODB-Dev] Cache Query (why doesn't RAM usage ever drop?)

Tim Peters tim@zope.com
Fri, 25 Oct 2002 11:25:08 -0400


[Shane Hathaway, to Chris Withers]
> I'm glad you brought this up, since I've been working on memory usage
> the past few days.  I noticed that after a cache flush, memory usage
> never dropped, but it also took a long time before it increased again.
> So I wrote a C extension that lets Python get memory statistics via
> "mallinfo()", and it confirmed what I suspected: after a cache flush,
> Zope only uses a fraction of the process size.  But the heap is sparse.
>   Since there's an object at the end of the address space and you can't
> move objects around, the memory can't be returned to the OS.

To complicate it more, Python never talks to "the OS", it talks to the
platform libc malloc package.  Whether *that* returns memory to "the OS" is
something Python has no say in at all.  Chris is most likely running on
Windows, and all flavors of Windows differ (from Linux, and from each other)
in the details too.

> Brian suggested that PyMalloc might alleviate this problem.
>
> http://www.python.org/dev/doc/devel/whatsnew/section-pymalloc.html

pymalloc is faster, more memory-efficient, and less memory-fragmenting, than
any known platform malloc, for allocating and deallocating "typically small"
Python objects.  However,

1. It passes larger requests on to the platform malloc.

2. It never even tries to return its own "pools" and "arenas" to the
   platform malloc.

WRT "returning memory to the system", it's hard to predict what will happen
despite #2.  What often happens is that by keeping small allocation requests
out of the platform malloc's hair, the platform malloc gets much less
fragmented in its own use of memory, and then process size shrinks more when
returning larger blocks *to* the platform malloc.  But maybe not; it depends
"on everything".

> I learned a bit about Linux and mmap through this exercise.  I
> discovered that when you allocate a block of at least 128KB (or whatever
> you set your threshold to be), malloc actually returns a memory-mapped
> file instead of space from the heap.  I'm not quite sure where the
> "file" exists :-) but like obmalloc.c says, instead of wasting memory
> you end up only wasting address space.  I think of it as many small
> heaps rather than a single big heap.

That's a good way to think of it!  Under the covers, Windows ends up
allocating many distinct *heaps*, which the 9x flavors do a much poorer job
of than the NT+ flavors.  On 9x I can construct little C programs that cause
the OS to crash due to fatal fragmentation of the entire address space
despite having only about 1% of the total address space in use.

So like obmalloc.c says <wink>, while it's just a waste of address space, it
can be fatal nevertheless, given a bad enough OS.

> ExtensionClass does not use PyMalloc.  I wonder if it would be worth the
> effort to change it.  (To the group) Does Python itself use PyMalloc?

In 2.3 it's enabled, but not before.  I'm afraid I have to advise against
enabling pymalloc before 2.3:  I spent about a month bulletproofing the
pymalloc code for 2.3, which means making Python's memory API wholly
backward compatible in its presence, and plugging security holes (if you dig
thru the Python-Dev archives, you'll eventually find a pure-Python program
that can crash the interpreter, overwrite bytecode, etc etc, by exploiting
holes in the pre-2.3 version of pymalloc; those holes are well & truly
plugged in the 2.3 version).  The pre-2.3 pymalloc also performs poorly with
new-style classes, because pymalloc was written long before those, and
needed tuning and new mini-algorithms to work well with new-style classes.

Enabling pymalloc is a major 2.3 feature, BTW -- there are real programs
that run 1000x faster, particularly those that trigger quadratic-time
behavior in the platform free() when freeing gazillions of small objects
(and, yes, Doug Lea's malloc, from which glibc's is derived, is among those
vulnerable to this, although Doug recently added a "tuning parameter" that
allows to stop such disasters if you know enough about them in advance).