[Zope-dev] ZCatalog, REQUEST, misc.

Chris McDonough chrism@digicool.com
Thu, 17 May 2001 09:17:32 -0400


Erik Enge wrote:
> I've indexed about 410.000 objects now.  A plain query with 'meta_type'
> and 'firstname' to searchResults takes about 3-4 seconds.  Not too bad,
> but not that good either.

I assume meta_type is a field index and 'firstname' is a text index. 
I'd be curious to know how long a query that involves only a single
field index takes, and how long a query that involves only a single text
index takes... does each take a roughly equivalent amount of time?  Or
is one much faster than the other?  If one is not much faster than the
other, it's a Catalog issue.  If one *is* much faster than the other,
it's an Index issue.

> I'm sure I can so something to make it faster, but other than index fewer
> objects (which I can't do, since I have more objects that needs to be
> indexed) I don't see what else I can do.

What's an acceptable query time for your application?  Are you sorting
the results via sort_on?

> Maybe one could design a framework of scalable Catalogs?  Just like ZEO
> does for ZODB?

There are specific incremental improvements that can be made to the
Catalog, especially via:

1) extending its query language

2) Making it less expensive to do incremental indexing -- need
   to queue up index requests

There is a proposal for 1 on dev.zope.org named
UnionAndIntersectionOperations.  2 has no proposal behind it.  There are
additionally no proposals to address lacking query time speed (of which
there haven't been too many reports, but I can imagine problems,
especially in conjunction with sorting).  

2.4 will have "drop in" indexes which will make it possible to write
your own type of index (such as, for example, a DateIndex, that stores
document ids presorted in reverse chronological order, so you don't
always need to sort the entire result set and reverse it for batch-type
operations).  This will have an impact on query speed to the extent that
you'll be able to use an appropriate type of index for your data rather
than stuffing it in to one of the default three.

- C