[Zope-dev] catalog performance: query plan
Tres Seaver
tseaver at palladion.com
Mon Nov 10 12:39:58 EST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hedley Roos wrote:
>>> object_implements |KeywordIndex |0.2172234| 4.6
>> This is clearly not the same issue as the other KeywordIndexes: in
>> fact, I am astonished that anybody would be using a KeywordIndex for
>> this at all. I would suspect that the real problem here is in the
>> appliation, rather than the index itself.
>
> The app is Plone :)
I don't know how Plone computes the values here (are they actual
interface objects, or their monikers), nor how it is queried. I am
suspicious that there is some query-time bogosity, however.
>>> portal_type |FieldIndex |0.0025984| 384.84
>> This one is surprising: its performance should be pretty similar to the
>> other FieldIndexes (e.g., 'review_state') which map a controlled
>> vocabulary onto the entire corpus. Was the query different than
>> 'review_state' (e.g., multi-valued vs. single-valued)?
>
> portal_type queries are usually multivalued in Plone.
What are the use cases for that?
>>> sourceUID |FieldIndex |0.0004886| 2046.31
>> Probably bogus, but I don't know how it is used.
>
>
> Plone's reference_catalog
I'm betting that using a catalog at all is a flawed choice.
>>> UID |FieldIndex |0.0003070| 3257.1
>> Note that this is the worst-case scenario for a FieldIndex: there is
>> exactly one value for every key. This shouldn't be "indexed" at all, in
>> fact, beyond a simple BTree (UID -> rid).
>
> I've never even thought of that. Perhaps the catalog is used to
> present a familiar API.
>
>>> targetUID |FieldIndex |0.0002287| 4372.12
>> I don't know what this one is used for, but it should probably be
>> scrapped as well.
>
> More reference_catalog.
>
>>> exact_getUserId |FieldIndex |0.0001931| 5177.79
>>> exact_getUserName |FieldIndex |0.0001816| 5504.39
>> I don't know how the application uses either of those indexes, but they
>> are almost certainly bogus in any normal catalog.
>
> Membrane and remember. They're currently tied to Plone but efforts are
> being made to make them work with CMF.
>
>>> relationship |FieldIndex |0.0000822| 12153.1
>>> id |FieldIndex |0.0000822| 12161.81
>>> end |DateIndex |0.0000623| 16027.48
>>> getGroups |FieldIndex |0.0000278| 35973.45
>> This is almost certainly bogus: FieldIndex is not supposed to be used
>> with multi-valued terms.
>
> Plone stuff, but I am intrigued by your statement. Why can FieldIndex
> not be used with multi-valued terms?
Because FieldIndex is designed for either exact-match queries or range
queries: range queries are obviously impossible for a 'getGroups'
method, and exact-match is nearly as dubious. I would have expected
this to be a KeywordIndex, if it needed indexing at all.
>>> Subject |KeywordIndex |0.0000253| 39413.57
>> This is the use-case for which KeywordIndex is designed. Was the query
>> just a single term, by chance?
>
> The simplest term is a list with only a single term (not counting the
> trivial case). It should be worse with more terms right?
I am reasonably confident that multi-valued queries against either
FieldIndexes or KeywordIndexes will perform worse than single-valued
queries against the same index.
>>> Title |ZCTextIndex |0.0000128| 77809.46
>> This should be removed: there is no valid use case for doing a
>> full-text search restricted only to the title.
>
> Plone specific.
No, still bogus. This is an egregiously stupid choice, with *large*
indexing-time downsides.
>>> Description |ZCTextIndex |0.0000116| 86241.39
>> Again, should be removed.
>
> Again, Plone specific :)
Again, egregiously bogus.
>>> getEmail |ZCTextIndex |0.0000113| 87849.05
>> Should *definitely* be removed: how can you do full-text search on an
>> e-mail address?
>
> I think membrane is responsible for this, but you're right.
>
>
>>> SearchableText |TextIndex |0.0000113| 88466.69
>> Where did this one come from? The 'SearchableText' above is a ZCTextIndex.
>
> Membrane!
>
> Kinda pointless for me to continue since this is turning into a
> Plone-specific discussion on zope-dev. But at least the whole exercise
> has forced us to look in detail into how all these indexes affect
> performance with a zodb with many many objects.
>
> Roche investigated Tesdal's queryplan today end it seems to solve
> nearly all our performance problems. He'll have to elaborate.
Ripping out stupid indexes is one of the first things I do to optimize a
client's badly-performing Plone site.
Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJGHHu+gerLs4ltQ4RAoDgAJ9sjGoOG2KftnqFIbVxFy0sNk8EDwCfaSbH
ouY7+FnSbSJvYSBtI//31ZY=
=3SeB
-----END PGP SIGNATURE-----
More information about the Zope-Dev
mailing list