[Zope] Re: ZCatalog alpha

michel@digicool.com michel@digicool.com
26 Jun 1999 15:20:57 -0400


Alex Rice <alrice@swcp.com> writes:

> On Fri, 25 Jun 1999 15:20:22 -0400,
> Michel Pelletier <michel@digicool.com> said:
> 
> Michel> Remember, this is *alpha* software.  It has limitations, and
> Michel> there are several optimizations/features we have planed for it
> Michel> which are not included in this release.  We would, however, like
> Michel> the tired to be well kicked.
> 
> Kicking the tired? That's cruel and unusual :-)

Ugh.  Well maybe at the time I felt like a kicked tired person.  Paul
called this a 'Fulton-typo', a typo that makes sense anyways and is
funnier than the original sentence.

> 
> re: ZCatalog
> This rocks. It's very fast.
> 

Yep.

> However, I seem to be limited by the amount of memory used by
> manage_catalogFoundItems. It's memory usage seems to balloon out of
> control, and never goes back to normal until restarting Zope.
> 
> Is this a memory leak, or just a consequence of the awesome indexing the
> catalog is doing?
> 

It *could* be a memory leak, but remember, for every unique word that
you text index, Zope creates several data structures.  On my list of
things to do to catalog to optimize it is to improve upon the 'Status' 
view so that you can look inside the indexes and see how many words
they are storing, and maybe some memory consumption statistics.

Don't be suprised, however, if the indexes grow to two or three times
the size of the data that is indexed.  That is the nature of indexing.

You may have noted from the code that ZCatalog uses 'UnIndex' and
'UnTextIndex'es.  These are different from the usual Index and
TextIndex modules that have come with Zope since 1.10.0.  The 'Un'
indexes are fully symetrical, meaning that not only do they have an
index that maps from word to document id (and score) they also have an 
inverted index that maps from document id to word.  The reason for
this is because the Catalog can not allways track when an object
changes (the only way it can do that is if the object notifies the
catalog that it is about to change).  'Un' indexes then need to keep a 
little extra information around, so that when an object comes in to
have itself Cataloged  again, the Index can first find out what the
old values of the object were (using the inverted index) unindex
those, and then reindex the object.

Pretty heavy stuff, although the use of fully symetrical indexes
really simplified the design of the 'Un' indexes over those of the
more memory efficient Indexes.

> Just for kicks I loaded several MB of texts into DTML Docs and this is
> where I found the memory problem. My setup is

The several MB of text would probably translate into 2 or 3 times that 
in memory growth.  However, the Persistence machinery will quickly
start to deactivate unused portions of the index.  Try tweaking your
Cache settings to see if you can get Zope to be more aggresive about
deactivating objects.

-Michel

> 
> Zope 2.0a3+ZServer+odb3+pcgi
> Python 1.5.2
> Linux 2.2.9/x86
> 
> Alex Rice    |    alrice@swcp.com    |    http://www.swcp.com/~alrice
>     Current Location: N. Rio Grande Bioregion, Southwestern USA
> 
> 
> 
> 
> 
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://www.zope.org/mailman/listinfo/zope
> 
> (For developer-specific issues, use the companion list,
> zope-dev@zope.org - http://www.zope.org/mailman/listinfo/zope-dev )