Re: [Zope-dev] 100k+ objects, or...Improving Performance of BTreeFolder...

10 Dec 2001

      At 04:08 PM 12/10/01 +0000, Tony McDonald wrote:
...
On 10/12/01 2:54 pm, "Phillip J. Eby" <pje@telecommunity.com> wrote:
...
I'm not sure if this is taken into consideration in your work so far/future
plans...  but just in case you were unaware, it is not necessary for you to
persistently store objects in the ZODB that you intend to index in a
ZCatalog.  All that is required is that the object to be cataloged is
accessible via a URL path.  ZSQL methods can be set up to be
URL-traversable, and to wrap a class around the returned row.  To load the
items into the catalog, you can use a PythonScript or similar to loop over
a multi-row query, passing the objects directly to the catalog along with a
path that matches the one they'll be retrievable from.  This approach would
eliminate the need for BTreeFolder altogether, although of course it
requires access to the RDBMS for retrievals.  This should reduce the number
of writes and allow for bigger subtransactions in a given quantity of 
memory.
Gad! - are you saying you don't need to store a 1Mb .doc file into the ZODB,
but can still index the thing, store the index information in the Zcatalog
(presumably a lot smaller than 1Mb) and have the actual file accessible from
a file system URL? If so, that's really neat!
Yep.  By "URL path", though, I meant a *Zope* path.  However it would be 
straightforward to create a Zope object that represents a filesystem path 
and does traversal/retrieval, assuming that one of the 'FS'-products out 
there doesn't already do this for you.

Chris Withers has pointed out that technically you don't even need the path 
string to be valid, it just has to be unique.  However, the standard tools 
and the method for getting the "real object" referred to by the catalog 
record do expect it to be a valid path IIRC.  I personally find it most 
convenient, therefore, to use a real Zope path.