[Zope-dev] ZCatalog caching with memcached
Roché Compaan
roche at upfrontsystems.co.za
Mon Oct 27 08:08:27 EDT 2008
On Sun, 2008-10-26 at 14:07 -0400, Tres Seaver wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Roché Compaan wrote:
> > On Sat, 2008-10-25 at 09:20 +0200, Hedley Roos wrote:
> >>> Have you measures the time needs for some "standard" ZCatalog queries
> >>> used with a Plone site with the communication overhead with memcached?
> >>> Generally spoken: I think the ZCatalog is in general fast. Queries using a
> >>> fulltext index are known to be more expensive or if you have to deal with
> >>> large resultsets or complex queries.
> >>>
> >> No I haven't. Roche Compaan has done extensive benchmarking using
> >> funkload testing plain catalog vs module level cache vs memcached, but
> >> the tests are more about page serving than catalog query time. I'll
> >> ask him to comment more on that.
> >
> > I actually did some profiling as well and catalog searches were just too
> > damn slow. The average execution time for searchResults was 100
> > milliseconds and this is why I told Hedley we should do some caching at
> > query level in the first place. I experimented with this idea a couple
> > of years back but wasn't successful due to inexperience. I was trying to
> > cache brains which obviously leads to persistency bugs. This time around
> > it was obvious to me that we should cache the IISet result sets.
> >
> > I suspect specific indexes are just performing suboptimally and needs to
> > be improved. ExtendPathIndex in Plone seems to be one of them.
> >
> > The effect on performance is really awesome, now we just need to fine
> > tune the implementation.
>
> Before (or while) we work on caching, can we try to improve the
> underlying indexes, and the way that applications use them? I'm pretty
> sure that there is a lot of room for improvement:
>
> - Plone uses too many indexes, and in particular, uses multiple text
> indexes. Having extra indexes around "just in case" is a sure lose
> a write time, and may even be expensive at query time (depending on
> the query).
>
> - Particular indexes have performance characteristics based on their
> designed purpose: for instance, the stock FieldIndex implementation
> assumes that the number of documents indexed will be >> the number of
> discrete indexable values. Using such an index in an application
> domain with a very large set of indexable values probably loses, and
> in ways which don't show up in early / small-scale testing.
>
> - I'm pretty sure that we haven't yet found the best data structure for
> "hierarchy indexes" (e.g., the Plone EPI index, or the stock Zope2
> PathIndex, etc.). Something like a 'trie' might be optimal for
> pure prefix searching of hierarchies.
>
> - I am confident that the TopicIndex is underutiliized: it does *all*
> the work for a given query at write time, and can thus be blindingly
> fast at query time.
>
> - Other special-purpose indexes (e.g., a "recent items" index) would
> be worth a look, especially for applications with large volumes of
> content.
I agree that one should look at improving performance without caching as
well. But this is a lot harder and takes significantly more development
and debugging time than introducing some form caching. So I'm not
convinced that it needs to happen in a certain order. If caching gives
you lots of performance with little effort now, then why shouldn't you
use it?
--
Roché Compaan
Upfront Systems http://www.upfrontsystems.co.za
More information about the Zope-Dev
mailing list