[Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)

Chris McDonough chrism@digicool.com
Tue, 26 Jun 2001 16:04:05 -0400


> Chris McDonough wrote:
> >
> > Yikes.  I wonder if this overhead comes from Vocabulary updates...
thanks
> > very much for doing this test.
>
> No, this should definetely _not_ be related to vocabulary: I simply
> copied an already indexed document and let ZCatalog.catalog_object munge
> the copy. So all words appearing in this copy already have an entry in
> the Vocabulary. I also checked it during a test without meta data: The
> vocabulary doed not increase.

OK, that's good to know...

> >
> > Clearly we need to pin it down.  This is very disappointing.  :-(  Any
> > further info you dig up is appreciated.
>
> Well, I don't have any at present. But allow me to make some guess :) If
> a new record is added to a BTree, is can be necessary to move a few
> other records around in order to keep the tree balanced. And some of the
> BTrees affected by my test are definitely somewhat larger, because I did
> not use German stop words during the test, so words like "und", "der",
> "die" are indexed which appear in _every_ document. (well, at least in
> _nearly_ every document)
>
> >
> > You didn't have any metadata stuff set up, did you?  I imagine even if
you
> > did, that they couldn't possibly account for 200K worth of extra stuff.
>
> Ouch, I forgot about the meta data. So here is the result of another
> test, with all meta data thrown away:
>
> Packed data base size, one document (same during the last test) to be
> cataloged:
> 229170221 bytes.
>
> data base size after updating the catalog run: 229310316 bytes
> size after packing: 229172566 bytes
>
> So, same as before :(

Well, I'm sort of stumped without doing it myself, and I can't at the
moment.  I'm going to add this to the Collector so I don't forget, and
hopefully it will be looked into and fixed by the time that 2.4.0 goes out.

Thanks so much,

- C