Erik Enge wrote:
I've indexed about 410.000 objects now. A plain query with 'meta_type' and 'firstname' to searchResults takes about 3-4 seconds. Not too bad, but not that good either.
I assume meta_type is a field index and 'firstname' is a text index. I'd be curious to know how long a query that involves only a single field index takes, and how long a query that involves only a single text index takes... does each take a roughly equivalent amount of time? Or is one much faster than the other? If one is not much faster than the other, it's a Catalog issue. If one *is* much faster than the other, it's an Index issue.
I'm sure I can so something to make it faster, but other than index fewer objects (which I can't do, since I have more objects that needs to be indexed) I don't see what else I can do.
What's an acceptable query time for your application? Are you sorting the results via sort_on?
Maybe one could design a framework of scalable Catalogs? Just like ZEO does for ZODB?
There are specific incremental improvements that can be made to the Catalog, especially via: 1) extending its query language 2) Making it less expensive to do incremental indexing -- need to queue up index requests There is a proposal for 1 on dev.zope.org named UnionAndIntersectionOperations. 2 has no proposal behind it. There are additionally no proposals to address lacking query time speed (of which there haven't been too many reports, but I can imagine problems, especially in conjunction with sorting). 2.4 will have "drop in" indexes which will make it possible to write your own type of index (such as, for example, a DateIndex, that stores document ids presorted in reverse chronological order, so you don't always need to sort the entire result set and reverse it for batch-type operations). This will have an impact on query speed to the extent that you'll be able to use an appropriate type of index for your data rather than stuffing it in to one of the default three. - C