[ZWeb] ZCatalog Issues

Tue Jul 13 10:28:23 EDT 2004

On Tuesday 13 July 2004 06:39 am, Jim Fulton wrote:
> Shane Hathaway wrote:
>  > Flushing a cache containing 20,000 objects can take minutes,
>
> Huh? This makes no sense.  Flushing objects just frees their
> state.  This should not take minutes.  If this is reproduceable,
> we ought to do some profiling to figure out what the heck is
> going on.

It's just an observation.  I postulate it happens because ZODB frees the 
objects in layers: it peels away all the unreferenced objects, revealing more 
objects that are now unreferenced, and iterates along those lines.  If for 
some reason it peels off only one object per pass, the total operation is 
O(n^2 / 2).

> > It seems like this catalog contains simply too many objects.  A third of
> > them are very small (less that 64 bytes including the class name); I
> > wonder if we could combine some of these.
>
> Interesting.  I wonder what these are.
>
>  > I think I'll next try to find out how many of
> >
> > the objects are in text indexes and lexicons.
>
> This is a tough and painstaking analysis. Good luck.

I was thinking I'd just export the lexicons and text indexes as .zexps.  I 
haven't done it yet.

> Some things I'd look for:
>
> - sorting
>
>    If we are doing lots of sorted searches, that could cause lots of
>    meta-data to be loaded.  I suspect that sorting on application
> attribtes, such as modification time, is the most common case of catalog
> abuse.

Yet for usability, we virtually always want to sort.

> - Too much meta data.

Agreed.  Unfortunately, it's hard to tell which metadata fields zope.org 
actually needs.

> - Maybe too many indexes
>
>    I think a common problem in Zpe sites is that they have a single catalog
>    that is used for a wide variety of independent searches.  I think that
> it would be more efficient in many cases to keep separate catalogs geared
> toward separate kids of searches.

That's an interesting idea.  I wonder if we could apply it here.

> I think it would be interesting to analyze:
>
> - What sorts of searches people are doing and how much time they take.
>    Is there an option to turn on elapsed time in the regular hit log? If
> not, there should be.

There should be, yes.

> - For searches that take a lot of time, analyze how many and what sorts of
>    objects are loaded to accomplish the searches.
>
> In summary, if a catalog is being used *properly*, only a small fraction
> (decreasing with increasing catalog size) of the catalog should be
> loaded at any point in time.  I fear we make catalog abuse too easy though.

Yep.

>
> > There is a bit of good news: zope.org is not consuming gobs of RAM due to
> > a memory leak.  I wrote a small Python C extension that uses mallinfo()
> > to reveal how much heap a Python process is actually using for objects,
> > which is often much smaller than the process size as the operating system
> > sees it. Whenever I flush the caches in Zope, its heap usage shrinks to
> > less than 10% of its process size.  That means most of the memory is
> > consumed by reclaimable ZODB objects.  (I'll post the C extension on the
> > web if anyone is interested.)
>
> But. over time, is the size it shrinks to constant? Ot is it increasing?

My point was that the shrunken size is small enough that I don't care.  I'm 
almost sure it increases over time, indicating a memory leak or two, but the 
leak is small enough that it's not a priority.

Shane