[Zope] Re: Zcatalog bloat problem (berkeleydb is a solution?)

Chris McDonough chrism@digicool.com
Tue, 26 Jun 2001 06:45:54 -0400


Hi Giovanni,

How many indexes do you have, what are the index types, and what do they
index?  Likewise, what about metadata?  In your last message, you said
there's about 20.  That's a heck of a lot of indexes.  Do you need them
all?

I can see a potential reason for the problem you explain as "and I
remind you that as the folder get populated, the size that is added to
each transaction grows, a folder with one hundred objects adds some
100K"... It's true that "normal" folders (most ObjectManager-derived
containers actually) cause database bloat within undoing storages when
an object is added or removed from it.  This is because it keeps a list
of contained subobject names in an "_objects" attribute, which is a
tuple.  When an object is added, the tuple is rewritten in entirety.  So
for instance, if you've got 100 items in your folder, and you add one
more, you rewrite all the instance data for the folder itself, which
includes the (large) _objects tuple (and of course, any other raw
attributes, like properties).  Over time, this can be problematic.

Shane's BTreeFolder Product attempts to ameliorate this problem a bit by
keeping the data that is normally stored in the _objects tuple in its
own persistent object (a btree).

Are you breaking the content up into subfolders?  This is recommended.

I'm temped to postulate that perhaps your problem isn't as much ZCatalog
as it is ObjectManager overhead.

- C


Giovanni Maruzzelli wrote:
> 
> Hello Zopistas,
> 
> thank'you all for your replies.
> 
> Our doubts still unresolved :-(
> 
> With a clever hack that Toby Dickenson made on the very useful tranalyzer,
> we was able to see what happen
> when we add or catalog an object. (BTW, we don't use CatalogAware).
> 
> We can send the output of tranalyzer2 to anyone interested, but in short
> words this is
> what happens in an empty folder (and I remind you that as the folder get
> populated, the size that
> is added to each transaction grows, a folder with one hundred objects adds
> some 100K):
> 
> if we add a normal DTML document (no catalog involved) in an empty folder we
> have
> a very small increase in size: the size of the dtml and the size of the
> folder:
> 
> TID: 33D853C2CE6CDBB @ 77396692 obs 2 len 729
> By ciao
> "/aacucu/addDTMLDocument"
> OID: 40817 len 270 [OFS.Folder.Folder]
> OID: 40818 len 309 [OFS.DTMLDocument.DTMLDocument]
> 
> if we add an "Articolo" that's cataloged on the fly in the same empty
> directory we have a bloating:
> 
> TID: 33D853D722FA167 @ 77397437 obs 96 len 226568
> By ciao
> "/aacucu/Articolo_add"
> OID: 40817 len 363 [OFS.Folder.Folder]
> OID: 40819 len 598 [*ennPsHQQKY5zjxlQs1ebmA==.Articolo]
> OID: 407b5 len 8074 [BTrees.IOBTree.IOBucket]
> OID: 37aa9 len 39 [BTrees.Length.Length]
> OID: 37b95 len 1483 [BTrees.OIBTree.OIBucket]
> OID: 407b7 len 1739 [BTrees.IOBTree.IOBucket]
> OID: 407b8 len 402 [BTrees.IIBTree.IISet]
> OID: 407b9 len 399 [BTrees.IOBTree.IOBucket]
> OID: 407ba len 402 [BTrees.IIBTree.IISet]
> OID: 407bb len 3497 [BTrees.IOBTree.IOBucket]
> OID: 407bc len 5871 [BTrees.OOBTree.OOBucket]
> OID: 37ab2 len 39 [BTrees.Length.Length]
> OID: 407c6 len 6279 [BTrees.IOBTree.IOBucket]
> OID: 3d7bf len 312 [BTrees.IIBTree.IISet]
> OID: 407c7 len 4507 [BTrees.IOBTree.IOBucket]
> OID: 3c992 len 837 [BTrees.OOBTree.OOBucket]
> OID: 37abe len 39 [BTrees.Length.Length]
> OID: 407d2 len 696 [BTrees.IOBTree.IOBucket]
> OID: 3cb4e len 572 [BTrees.IIBTree.IISet]
> OID: 407d3 len 537 [BTrees.IOBTree.IOBucket]
> OID: 40809 len 387 [BTrees.IIBTree.IISet]
> OID: 407d4 len 507 [BTrees.IOBTree.IOBucket]
> OID: 4080a len 387 [BTrees.IIBTree.IISet]
> OID: 407d5 len 507 [BTrees.IOBTree.IOBucket]
> OID: 4080b len 387 [BTrees.IIBTree.IISet]
> OID: 407d6 len 507 [BTrees.IOBTree.IOBucket]
> OID: 4080c len 387 [BTrees.IIBTree.IISet]
> OID: 407d7 len 339 [BTrees.IOBTree.IOBucket]
> OID: 4080d len 382 [BTrees.IIBTree.IISet]
> OID: 407d8 len 339 [BTrees.IOBTree.IOBucket]
> OID: 4080e len 382 [BTrees.IIBTree.IISet]
> OID: 407d9 len 339 [BTrees.IOBTree.IOBucket]
> OID: 3d064 len 597 [BTrees.IIBTree.IISet]
> OID: 407da len 347 [BTrees.IOBTree.IOBucket]
> OID: 4080f len 387 [BTrees.IIBTree.IISet]
> OID: 407db len 339 [BTrees.IOBTree.IOBucket]
> OID: 3d1ba len 642 [BTrees.IIBTree.IISet]
> OID: 407dc len 339 [BTrees.IOBTree.IOBucket]
> OID: 40810 len 372 [BTrees.IIBTree.IISet]
> OID: 407dd len 339 [BTrees.IOBTree.IOBucket]
> OID: 40811 len 372 [BTrees.IIBTree.IISet]
> OID: 407de len 339 [BTrees.IOBTree.IOBucket]
> OID: 37f11 len 977 [BTrees.IOBTree.IOBucket]
> OID: 380de len 830 [BTrees.OIBTree.OIBucket]
> OID: 37ac4 len 25537 [BTrees.IIBTree.IISet]
> OID: 37ac7 len 9892 [BTrees.IIBTree.IISet]
> OID: 37aca len 13947 [BTrees.IIBTree.IISet]
> OID: 38922 len 387 [BTrees.IIBTree.IISet]
> OID: 38643 len 827 [BTrees.IIBTree.IISet]
> OID: 3894c len 92 [BTrees.IIBTree.IISet]
> OID: 388ff len 24707 [BTrees.IIBTree.IISet]
> OID: 38581 len 277 [BTrees.IIBTree.IISet]
> OID: 3d7f7 len 319 [BTrees.IOBTree.IOBTree]
> OID: 3d7f8 len 356 [BTrees.IOBTree.IOBTree]
> OID: 40812 len 372 [BTrees.IIBTree.IISet]
> OID: 407e0 len 339 [BTrees.IOBTree.IOBucket]
> OID: 40813 len 387 [BTrees.IIBTree.IISet]
> OID: 407e1 len 339 [BTrees.IOBTree.IOBucket]
> OID: 40814 len 362 [BTrees.IIBTree.IISet]
> OID: 407e2 len 507 [BTrees.IOBTree.IOBucket]
> OID: 37eb9 len 981 [BTrees.IOBTree.IOBucket]
> OID: 38197 len 804 [BTrees.OIBTree.OIBucket]
> OID: 38ac7 len 7947 [BTrees.IIBTree.IISet]
> OID: 387f6 len 97 [BTrees.IIBTree.IISet]
> OID: 383f7 len 850 [BTrees.OOBTree.OOBucket]
> OID: 4081a len 47 [BTrees.IIBTree.IISet]
> OID: 38407 len 850 [BTrees.OOBTree.OOBucket]
> OID: 4081b len 47 [BTrees.IIBTree.IISet]
> OID: 388ac len 92 [BTrees.IIBTree.IISet]
> OID: 387d4 len 152 [BTrees.IIBTree.IISet]
> OID: 3868c len 152 [BTrees.IIBTree.IISet]
> OID: 38681 len 142 [BTrees.IIBTree.IISet]
> OID: 388b0 len 72 [BTrees.IIBTree.IISet]
> OID: 384f1 len 52 [BTrees.IIBTree.IISet]
> OID: 37ca6 len 586 [BTrees.IOBTree.IOBucket]
> OID: 4081c len 686 [BTrees.IOBTree.IOBucket]
> OID: 37ab8 len 39336 [BTrees.IOBTree.IOBTree]
> OID: 381d8 len 594 [BTrees.OIBTree.OIBucket]
> OID: 38ac9 len 1252 [BTrees.IIBTree.IISet]
> OID: 38770 len 52 [BTrees.IIBTree.IISet]
> OID: 37d94 len 1234 [BTrees.IOBTree.IOBucket]
> OID: 3821d len 617 [BTrees.OIBTree.OIBucket]
> OID: 38acb len 557 [BTrees.IIBTree.IISet]
> OID: 38710 len 52 [BTrees.IIBTree.IISet]
> OID: 386ac len 52 [BTrees.IIBTree.IISet]
> OID: 38409 len 1019 [BTrees.OOBTree.OOBucket]
> OID: 4081d len 47 [BTrees.IIBTree.IISet]
> OID: 3870b len 52 [BTrees.IIBTree.IISet]
> OID: 38403 len 816 [BTrees.OOBTree.OOBucket]
> OID: 4081e len 47 [BTrees.IIBTree.IISet]
> OID: 387fe len 57 [BTrees.IIBTree.IISet]
> OID: 387cc len 67 [BTrees.IIBTree.IISet]
> OID: 38b29 len 1228 [BTrees.IOBTree.IOBucket]
> OID: 38c19 len 904 [BTrees.IOBTree.IOBucket]
> OID: 38d37 len 1007 [BTrees.IOBTree.IOBucket]
> OID: 3c610 len 33864 [BTrees.IOBTree.IOBucket]
> 
> ----- Original Message -----
> Sent: Monday, June 25, 2001 6:07 PM
> Subject: Re: [Zope-dev] Zcatalog bloat problem (berkeleydb is a solution?)
> 
> > > A solution might be a kind of "lazy catalog awareness": Instead of
> > > mangling a new object through one or more catalogs when it is created,
> > > this object could be added to a list of objects to be cataloged later.
> > > This way, the transaction to insert a new object would become much
> > > "cheaper". I'm working on this, but right now it is quite messy. (I'm
> > > new to Python and Zope, and hence I'm stumbling over a few, hmmm,
> > > trip-wires...)
> >
> > This purpose aligns well with those of the ArmoredCatalog proposal as
> well..
> > see http://dev.zope.org/Wikis/DevSite/Proposals/ArmoredCatalog .
> >
> > > But even using such a "lazy catalog awareness", you might get into
> > > trouble. Using the ZCatalog's "find objects" function, I hit the limits
> > > of my Linux box: 640 MB RAM were not enough...
> >
> > This should not happen.  :-(
> >
> > I'm really disappointed that the bloat and memory consumption issues are
> > still plaguing the ZCatalog.  At one point, I really thought we had it
> > pretty much licked.  I suppose this was naive.
> >
> > > A few weeks ago, I've posted this (admittedly not fully cooked) patch to
> > > this list, but did not get yet any response.
> >
> > I apologize for this.  We have a fairly formalized process for handling
> > feature-ish collector issues, and this hasn't come round on the guitar.
> I'm
> > beyond disappointed that people are still having unacceptable bloat,
> enough
> > that something like this patch needed to be submitted.  It's
> disheartening.
> > :-(
> >
> > - C
> >