On Fri, 4 Aug 2000, Michel Pelletier wrote:
Andy Dawkins wrote:
The problem we have is getting that many objects in to the Catalog. If we load the objects in to the ZODB, then catalog them, the machine either runs out of memory or, if we lower the sub transactions, It runs out of hard drive space.
Don't lower the subtransactions too much; because of the way BTree works you wind up generating a *lot* more disk writes than you would think. I can catalog 61K records (a small amount of data for each record, though) on a machine with 256MB of memory. More memory is the easiest solution...
If we use CatalogAware to catalog the objects as they are imported the Catalog explodes to stupid sizes because CatalogAware doesn't support Sub transactions.
Subtransactions are a storage thing, and really don't have anything to do with catalogaware, if you have a subtransaction threshold set then subtransactions will be used for any cataloging operation, catalogaware or not.
I've imported my whole 61K object folder tree, and the resulting Data.fs file was about twice the size of the zexp file. That hardly sounds like "exploded", so maybe there's something odd in the way you are doing the import? You definately don't want to be committing transactions or subtransactions too often.
Also as messages arrived over time the Catalog would once again explode dramatically,
This is definately an issue for something like archiving a mailing list. It sounds like, in the current state of things, you really want to move to a non-transaction storage for the catalog.
There isn't anything wrong with the Catalog (for this particular problem), or at least, there isn't anything in the catalog to fix that would solve your problem. We've had customers index well over 50,000 objects; you just have to understand the resource constraints and work with them, for example, don't mass index, use storages that scale to high write environments, etc.
There has, however, been at least one posting from DC about the technology that underlies the catalog, the BTree. Apparently there *is* some tuning that can be done to make the BTree generate fewer object updates when modifications take place (something about parent objects getting updated unnecessarly, my hazy memory says). Is any active work being done on BTree? --RDM