[Zope-dev] Re: Zope Mailing Lists and ZCatalog
Kapil Thangavelu
kvthan@wm.edu
Mon, 07 Aug 2000 09:18:13 -0700
I've been working on a mailman archive/search interface in zope. I
choose not to do the search mechanisms in zope because I was under the
impression that ZCatalog is great for object indexing but that it would
not be ideal for mass text indexing with 100K+ objects and 100MBs+ of
text.
The comments below seem to indicate that its only problems with mass
indexing and transaction storage which would both get mitigated by
moving to a incremental indexing scheme.
but wouldn't you run into performance problems on searches and getting
available memory to powerup the catalog search?
i guess what i'm looking for is a maxim on catalog usage in terms of
number of objects/indexes and a machines specs?
Curious
Kapil
btw a demo of my mailman search interface is at
http://sindev.dyndns.org/TGrounds/archive_search
Michel Pelletier wrote:
>
> Andy Dawkins wrote:
> >
> > Michel
> >
> > In case you are not aware, we at NIP currently host a complete archive of
> > the Zope mailing lists that are publicly available.
>
> Yep.
>
> > We are using ZCatalog to index all the messages from the Mailing list
> > archives. To give you an idea of numbers, the Zope mailing list alone is
> > over 30,000 messages.
>
> > The problem we have is getting that many objects in to the Catalog. If we
> > load the objects in to the ZODB, then catalog them, the machine either runs
> > out of memory or, if we lower the sub transactions, It runs out of hard
> > drive space.
>
> This is because you are indexing more content than you have virtual+tmp
> memory to store the transaction in. Zope is transaction, as I'm sure
> you know, so it has to store the transaction somewhere so it can roll it
> back if neccesary, and memory+tmp storage is where that goes
> (subtransactions are swapped out to tmp).
>
> > If we use CatalogAware to catalog the objects as they are imported the
> > Catalog explodes to stupid sizes because CatalogAware doesn't support Sub
> > transactions.
>
> Subtransactions are a storage thing, and really don't have anything to
> do with catalogaware, if you have a subtransaction threshold set then
> subtransactions will be used for any cataloging operation, catalogaware
> or not.
>
> > We could solve these issues by regularly packing the database during the
> > import, but it isn't a perfect solution.
>
> I'm not sure what you mean with these last to paragraphs, it seems like
> you have two problems:
>
> 1) you are mass indexing and running out of memory
>
> 2) you are indexing lots of content quickly and your database is growing
>
> The answer to 1 is to not mass index and incrimentatly index over time.
> The answer to 2 is to use a storage that does not store old revisions,
> like berkeley storage.
>
> > Also as messages arrived over time the Catalog would once again explode
> > dramatically,
>
> > Basically we(NIP) would like to know if you(Michel/DC) are planning to
> > improve ZCatalog/CatalogAware, if you are planning a successor to ZCatalog
> > or basically any information that could be useful to us regarding the
> > current development and urgency of ZCatalog/CatalogAware.
>
> There isn't anything wrong with the Catalog (for this particular
> problem), or at least, there isn't anything in the catalog to fix that
> would solve your problem. We've had customers index well over 50,000
> objects; you just have to understand the resource constraints and work
> with them, for example, don't mass index, use storages that scale to
> high write environments, etc.
>
> > Thanks in advance for your assistance.
>
> NP.
>
> -Michel
>
> _______________________________________________
> Zope-Dev maillist - Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://lists.zope.org/mailman/listinfo/zope-announce
> http://lists.zope.org/mailman/listinfo/zope )