[Zope-dev] catalog performance: query plan
Laurence Rowe
l at lrowe.co.uk
Mon Nov 10 17:17:49 EST 2008
Lennart Regebro wrote:
> On Sun, Nov 9, 2008 at 19:58, Roché Compaan <roche at upfrontsystems.co.za> wrote:
>> Since I'm in full agreement that we need to fix indexes that are
>> problematic, I started doing some benchmarks on the large data set that
>> gave us so many headaches. It is probably not surprising that the more
>> complex indexes are performing badly. DateRangeIndex, KeywordIndex and
>> Plone's ExtendedPathIndex performed the worst. Below are some stats
>> showing timings around the "apply_index" call in Catalog.py that was
>> done while testing the application with real data:
>
> ExtendedPathIndex doesn't need fixing, but we need to stop using it.
> It's done to support navigation trees from the catalog, but navigation
> should not be done via the same catalog as you do other things, but a
> dedicated tool. That would simplify and speed things up a lot. But OK,
> that's off-topic.
>
I wander if this could be replaced by zc.relationship / plone.relations?
There is potential for removing the five.intid / zope.app.keyreference
layer of indirection if the actual oid was stored instead, with an index
to a list of database names packed into the first byte. There would even
be room to store a reference to the objects class (using the pickle
protocol 2 registry to convert this to an integer) in the next two or
three bytes if creating ghosts were useful. This would still leave at
least 32 bits of space (4 billion) for the actual object id.
Without storing the aq_chain explicitly we would need to ensure that
__parent__ pointers were pickled for all content objects. The objects
themselves could be used instead of metadata rows (without a security
check it would be as simple as loading the oid from the relevant db
connection). So long as all the required metadata was stored on the
object itself only one load would be required for each object.
If this same keyreference were used in the indexes of the catalog
instead of rowids then result sets could be merged.
The downside is that the set intersections would require double the
memory of the current 32 bit ids.
Laurence
More information about the Zope-Dev
mailing list