I'm not sure if this is taken into consideration in your work so far/future plans... but just in case you were unaware, it is not necessary for you to persistently store objects in the ZODB that you intend to index in a ZCatalog. All that is required is that the object to be cataloged is accessible via a URL path. ZSQL methods can be set up to be URL-traversable, and to wrap a class around the returned row. To load the items into the catalog, you can use a PythonScript or similar to loop over a multi-row query, passing the objects directly to the catalog along with a path that matches the one they'll be retrievable from. This approach would eliminate the need for BTreeFolder altogether, although of course it requires access to the RDBMS for retrievals. This should reduce the number of writes and allow for bigger subtransactions in a given quantity of memory. At 07:36 PM 12/9/01 -0800, sean.upton@uniontrib.com wrote:
Interesting FYI for those looking to support lots of cataloged objects in ZODB and Zope (Chris W., et al)... I'm working on a project to put ~350k Cataloged objects (customer database) in a single BTreeFolder-derived container; these objects are 'proxy' objects which each expose a single record in a relational dataset, and allow about 8 fields to be indexed (2 of which, TextIndexes).
...
- Also, I want to make it clear that if I had a data access API that needed more than simple information about my datasets (i.e. I was trying to do reporting on patterns, like CRM-ish types of applications), I would likely wrap a function around indexes done in the RDB, not in Catalog. My requires no reporting functionality, and thus really needs no indexes, other than for finding a record for customer service purposes and account validation purposes. The reason, however, that I chose ZCatalog was for full text indexing that I could control/hack/customize easily. My slightly uninformed belief now is that for big datasets or "enterprise" applications (whatever that means), I would use a hybrid set of (faster) indexes using the RDB's indexes where appropriate (heavily queried fields), and ZCatalog for TextIndexes (convenient). I'm sure inevitable improvements to ZCatalog (there seems to be community interest in such) will help here.