[Zope-dev] ZCatalog caching with memcached

Mon Oct 27 08:08:27 EDT 2008

On Sun, 2008-10-26 at 14:07 -0400, Tres Seaver wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Roché Compaan wrote:
> > On Sat, 2008-10-25 at 09:20 +0200, Hedley Roos wrote:
> >>> Have you measures the time needs for some "standard" ZCatalog queries
> >>> used with a Plone site with the communication overhead with memcached?
> >>> Generally spoken: I think the ZCatalog is in general fast. Queries using a
> >>> fulltext index are known to be more expensive or if you have to deal with
> >>> large resultsets or complex queries.
> >>>
> >> No I haven't. Roche Compaan has done extensive benchmarking using
> >> funkload testing plain catalog vs module level cache vs memcached, but
> >> the tests are more about page serving than catalog query time. I'll
> >> ask him to comment more on that.
> > 
> > I actually did some profiling as well and catalog searches were just too
> > damn slow. The average execution time for searchResults was 100
> > milliseconds and this is why I told Hedley we should do some caching at
> > query level in the first place. I experimented with this idea a couple
> > of years back but wasn't successful due to inexperience. I was trying to
> > cache brains which obviously leads to persistency bugs. This time around
> > it was obvious to me that we should cache the IISet result sets.
> > 
> > I suspect specific indexes are just performing suboptimally and needs to
> > be improved. ExtendPathIndex in Plone seems to be one of them.
> > 
> > The effect on performance is really awesome, now we just need to fine
> > tune the implementation.
> 
> Before (or while) we work on caching, can we try to improve the
> underlying indexes, and the way that applications use them?  I'm pretty
> sure that there is a lot of room for improvement:
> 
>  - Plone uses too many indexes, and in particular, uses multiple text
>    indexes.  Having extra indexes around "just in case" is a sure lose
>    a write time, and may even be expensive at query time (depending on
>    the query).
> 
>  - Particular indexes have performance characteristics based on their
>    designed purpose:  for instance, the stock FieldIndex implementation
>    assumes that the number of documents indexed will be >> the number of
>    discrete indexable values.  Using such an index in an application
>    domain with a very large set of indexable values probably loses, and
>    in ways which don't show up in early / small-scale testing.
> 
>  - I'm pretty sure that we haven't yet found the best data structure for
>    "hierarchy indexes" (e.g., the Plone EPI index, or the stock Zope2
>    PathIndex, etc.).  Something like a 'trie' might be optimal for
>    pure prefix searching of hierarchies.
> 
>  - I am confident that the TopicIndex is underutiliized:  it does *all*
>    the work for a given query at write time, and can thus be blindingly
>    fast at query time.
> 
>  - Other special-purpose indexes (e.g., a "recent items" index) would
>    be worth a look, especially for applications with large volumes of
>    content.

I agree that one should look at improving performance without caching as
well. But this is a lot harder and takes significantly more development
and debugging time than introducing some form caching. So I'm not
convinced that it needs to happen in a certain order. If caching gives
you lots of performance with little effort now, then why shouldn't you
use it?

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za