[Zope-dev] Re: Zcatalog bloat problem (berkeleydb is a solution?)
Giovanni Maruzzelli
maruzz@open4.it
Tue, 26 Jun 2001 13:30:35 +0200
Hi Chris,
I don't think this is a problem of ObjectManager, also if it contribute to
the bloating.
We do breaks the content in subfolders, but our subfolders easily grows to
contains some hundred objects.
Do you think that the number of indexes contribute to the bloating? If this
is important, we can try to compact them in a littler number (eg: the
boolean indexes can become a sort of bitmask, eliminate the meta_type,
etc.).
This is our indexes (cut and paste from the ZMI), and following there is our
metadata :
INDEXES:
PrincipiaSearchSource Text Index 2,524
autore Keyword Index 4,055
bflow0 Field Index 4,055
bflow1 Field Index 4,055
bflow2 Field Index 4,055
bflow3 Field Index 4,055
bflow4 Field Index 4,055
bflow5 Field Index 4,055
bflow6 Field Index 4,055
bflow7 Field Index 4,055
bflow8 Field Index 4,055
bflow9 Field Index 4,055
bobobase_modification_time Field Index 4,300
dflow0 Field Index 4,055
dflow1 Field Index 4,055
id Field Index 4,300
m_sflow0 Keyword Index 3,960
m_sflow1 Keyword Index 3,960
m_sflow2 Keyword Index 3,960
meta_type Field Index 4,300
pseudoId Text Index 4,054
revisore Keyword Index 4,055
title Text Index 3,844
METADATA:
bobobase_modification_time
id
meta_type
pseudoId
title
----- Original Message -----
Sent: Tuesday, June 26, 2001 12:45 PM
Subject: Re: Zcatalog bloat problem (berkeleydb is a solution?)
>
> Hi Giovanni,
>
> How many indexes do you have, what are the index types, and what do they
> index? Likewise, what about metadata? In your last message, you said
> there's about 20. That's a heck of a lot of indexes. Do you need them
> all?
>
> I can see a potential reason for the problem you explain as "and I
> remind you that as the folder get populated, the size that is added to
> each transaction grows, a folder with one hundred objects adds some
> 100K"... It's true that "normal" folders (most ObjectManager-derived
> containers actually) cause database bloat within undoing storages when
> an object is added or removed from it. This is because it keeps a list
> of contained subobject names in an "_objects" attribute, which is a
> tuple. When an object is added, the tuple is rewritten in entirety. So
> for instance, if you've got 100 items in your folder, and you add one
> more, you rewrite all the instance data for the folder itself, which
> includes the (large) _objects tuple (and of course, any other raw
> attributes, like properties). Over time, this can be problematic.
>
> Shane's BTreeFolder Product attempts to ameliorate this problem a bit by
> keeping the data that is normally stored in the _objects tuple in its
> own persistent object (a btree).
>
> Are you breaking the content up into subfolders? This is recommended.
>
> I'm temped to postulate that perhaps your problem isn't as much ZCatalog
> as it is ObjectManager overhead.
>
> - C
>
>
> Giovanni Maruzzelli wrote:
> >
> > Hello Zopistas,
> >
> > thank'you all for your replies.
> >
> > Our doubts still unresolved :-(
> >
> > With a clever hack that Toby Dickenson made on the very useful
tranalyzer,
> > we was able to see what happen
> > when we add or catalog an object. (BTW, we don't use CatalogAware).
> >
> > We can send the output of tranalyzer2 to anyone interested, but in short
> > words this is
> > what happens in an empty folder (and I remind you that as the folder get
> > populated, the size that
> > is added to each transaction grows, a folder with one hundred objects
adds
> > some 100K):
> >
> > if we add a normal DTML document (no catalog involved) in an empty
folder we
> > have
> > a very small increase in size: the size of the dtml and the size of the
> > folder:
> >
> > TID: 33D853C2CE6CDBB @ 77396692 obs 2 len 729
> > By ciao
> > "/aacucu/addDTMLDocument"
> > OID: 40817 len 270 [OFS.Folder.Folder]
> > OID: 40818 len 309 [OFS.DTMLDocument.DTMLDocument]
> >
> > if we add an "Articolo" that's cataloged on the fly in the same empty
> > directory we have a bloating:
> >
> > TID: 33D853D722FA167 @ 77397437 obs 96 len 226568
> > By ciao
> > "/aacucu/Articolo_add"
> > OID: 40817 len 363 [OFS.Folder.Folder]
> > OID: 40819 len 598 [*ennPsHQQKY5zjxlQs1ebmA==.Articolo]
> > OID: 407b5 len 8074 [BTrees.IOBTree.IOBucket]
> > OID: 37aa9 len 39 [BTrees.Length.Length]
> > OID: 37b95 len 1483 [BTrees.OIBTree.OIBucket]
> > OID: 407b7 len 1739 [BTrees.IOBTree.IOBucket]
> > OID: 407b8 len 402 [BTrees.IIBTree.IISet]
> > OID: 407b9 len 399 [BTrees.IOBTree.IOBucket]
> > OID: 407ba len 402 [BTrees.IIBTree.IISet]
> > OID: 407bb len 3497 [BTrees.IOBTree.IOBucket]
> > OID: 407bc len 5871 [BTrees.OOBTree.OOBucket]
> > OID: 37ab2 len 39 [BTrees.Length.Length]
> > OID: 407c6 len 6279 [BTrees.IOBTree.IOBucket]
> > OID: 3d7bf len 312 [BTrees.IIBTree.IISet]
> > OID: 407c7 len 4507 [BTrees.IOBTree.IOBucket]
> > OID: 3c992 len 837 [BTrees.OOBTree.OOBucket]
> > OID: 37abe len 39 [BTrees.Length.Length]
> > OID: 407d2 len 696 [BTrees.IOBTree.IOBucket]
> > OID: 3cb4e len 572 [BTrees.IIBTree.IISet]
> > OID: 407d3 len 537 [BTrees.IOBTree.IOBucket]
> > OID: 40809 len 387 [BTrees.IIBTree.IISet]
> > OID: 407d4 len 507 [BTrees.IOBTree.IOBucket]
> > OID: 4080a len 387 [BTrees.IIBTree.IISet]
> > OID: 407d5 len 507 [BTrees.IOBTree.IOBucket]
> > OID: 4080b len 387 [BTrees.IIBTree.IISet]
> > OID: 407d6 len 507 [BTrees.IOBTree.IOBucket]
> > OID: 4080c len 387 [BTrees.IIBTree.IISet]
> > OID: 407d7 len 339 [BTrees.IOBTree.IOBucket]
> > OID: 4080d len 382 [BTrees.IIBTree.IISet]
> > OID: 407d8 len 339 [BTrees.IOBTree.IOBucket]
> > OID: 4080e len 382 [BTrees.IIBTree.IISet]
> > OID: 407d9 len 339 [BTrees.IOBTree.IOBucket]
> > OID: 3d064 len 597 [BTrees.IIBTree.IISet]
> > OID: 407da len 347 [BTrees.IOBTree.IOBucket]
> > OID: 4080f len 387 [BTrees.IIBTree.IISet]
> > OID: 407db len 339 [BTrees.IOBTree.IOBucket]
> > OID: 3d1ba len 642 [BTrees.IIBTree.IISet]
> > OID: 407dc len 339 [BTrees.IOBTree.IOBucket]
> > OID: 40810 len 372 [BTrees.IIBTree.IISet]
> > OID: 407dd len 339 [BTrees.IOBTree.IOBucket]
> > OID: 40811 len 372 [BTrees.IIBTree.IISet]
> > OID: 407de len 339 [BTrees.IOBTree.IOBucket]
> > OID: 37f11 len 977 [BTrees.IOBTree.IOBucket]
> > OID: 380de len 830 [BTrees.OIBTree.OIBucket]
> > OID: 37ac4 len 25537 [BTrees.IIBTree.IISet]
> > OID: 37ac7 len 9892 [BTrees.IIBTree.IISet]
> > OID: 37aca len 13947 [BTrees.IIBTree.IISet]
> > OID: 38922 len 387 [BTrees.IIBTree.IISet]
> > OID: 38643 len 827 [BTrees.IIBTree.IISet]
> > OID: 3894c len 92 [BTrees.IIBTree.IISet]
> > OID: 388ff len 24707 [BTrees.IIBTree.IISet]
> > OID: 38581 len 277 [BTrees.IIBTree.IISet]
> > OID: 3d7f7 len 319 [BTrees.IOBTree.IOBTree]
> > OID: 3d7f8 len 356 [BTrees.IOBTree.IOBTree]
> > OID: 40812 len 372 [BTrees.IIBTree.IISet]
> > OID: 407e0 len 339 [BTrees.IOBTree.IOBucket]
> > OID: 40813 len 387 [BTrees.IIBTree.IISet]
> > OID: 407e1 len 339 [BTrees.IOBTree.IOBucket]
> > OID: 40814 len 362 [BTrees.IIBTree.IISet]
> > OID: 407e2 len 507 [BTrees.IOBTree.IOBucket]
> > OID: 37eb9 len 981 [BTrees.IOBTree.IOBucket]
> > OID: 38197 len 804 [BTrees.OIBTree.OIBucket]
> > OID: 38ac7 len 7947 [BTrees.IIBTree.IISet]
> > OID: 387f6 len 97 [BTrees.IIBTree.IISet]
> > OID: 383f7 len 850 [BTrees.OOBTree.OOBucket]
> > OID: 4081a len 47 [BTrees.IIBTree.IISet]
> > OID: 38407 len 850 [BTrees.OOBTree.OOBucket]
> > OID: 4081b len 47 [BTrees.IIBTree.IISet]
> > OID: 388ac len 92 [BTrees.IIBTree.IISet]
> > OID: 387d4 len 152 [BTrees.IIBTree.IISet]
> > OID: 3868c len 152 [BTrees.IIBTree.IISet]
> > OID: 38681 len 142 [BTrees.IIBTree.IISet]
> > OID: 388b0 len 72 [BTrees.IIBTree.IISet]
> > OID: 384f1 len 52 [BTrees.IIBTree.IISet]
> > OID: 37ca6 len 586 [BTrees.IOBTree.IOBucket]
> > OID: 4081c len 686 [BTrees.IOBTree.IOBucket]
> > OID: 37ab8 len 39336 [BTrees.IOBTree.IOBTree]
> > OID: 381d8 len 594 [BTrees.OIBTree.OIBucket]
> > OID: 38ac9 len 1252 [BTrees.IIBTree.IISet]
> > OID: 38770 len 52 [BTrees.IIBTree.IISet]
> > OID: 37d94 len 1234 [BTrees.IOBTree.IOBucket]
> > OID: 3821d len 617 [BTrees.OIBTree.OIBucket]
> > OID: 38acb len 557 [BTrees.IIBTree.IISet]
> > OID: 38710 len 52 [BTrees.IIBTree.IISet]
> > OID: 386ac len 52 [BTrees.IIBTree.IISet]
> > OID: 38409 len 1019 [BTrees.OOBTree.OOBucket]
> > OID: 4081d len 47 [BTrees.IIBTree.IISet]
> > OID: 3870b len 52 [BTrees.IIBTree.IISet]
> > OID: 38403 len 816 [BTrees.OOBTree.OOBucket]
> > OID: 4081e len 47 [BTrees.IIBTree.IISet]
> > OID: 387fe len 57 [BTrees.IIBTree.IISet]
> > OID: 387cc len 67 [BTrees.IIBTree.IISet]
> > OID: 38b29 len 1228 [BTrees.IOBTree.IOBucket]
> > OID: 38c19 len 904 [BTrees.IOBTree.IOBucket]
> > OID: 38d37 len 1007 [BTrees.IOBTree.IOBucket]
> > OID: 3c610 len 33864 [BTrees.IOBTree.IOBucket]
> >
> > ----- Original Message -----
> > Sent: Monday, June 25, 2001 6:07 PM
> > Subject: Re: [Zope-dev] Zcatalog bloat problem (berkeleydb is a
solution?)
> >
> > > > A solution might be a kind of "lazy catalog awareness": Instead of
> > > > mangling a new object through one or more catalogs when it is
created,
> > > > this object could be added to a list of objects to be cataloged
later.
> > > > This way, the transaction to insert a new object would become much
> > > > "cheaper". I'm working on this, but right now it is quite messy.
(I'm
> > > > new to Python and Zope, and hence I'm stumbling over a few, hmmm,
> > > > trip-wires...)
> > >
> > > This purpose aligns well with those of the ArmoredCatalog proposal as
> > well..
> > > see http://dev.zope.org/Wikis/DevSite/Proposals/ArmoredCatalog .
> > >
> > > > But even using such a "lazy catalog awareness", you might get into
> > > > trouble. Using the ZCatalog's "find objects" function, I hit the
limits
> > > > of my Linux box: 640 MB RAM were not enough...
> > >
> > > This should not happen. :-(
> > >
> > > I'm really disappointed that the bloat and memory consumption issues
are
> > > still plaguing the ZCatalog. At one point, I really thought we had it
> > > pretty much licked. I suppose this was naive.
> > >
> > > > A few weeks ago, I've posted this (admittedly not fully cooked)
patch to
> > > > this list, but did not get yet any response.
> > >
> > > I apologize for this. We have a fairly formalized process for
handling
> > > feature-ish collector issues, and this hasn't come round on the
guitar.
> > I'm
> > > beyond disappointed that people are still having unacceptable bloat,
> > enough
> > > that something like this patch needed to be submitted. It's
> > disheartening.
> > > :-(
> > >
> > > - C
> > >