[Zope-dev] Re: Caching ZCatalog results

Sun Feb 25 04:48:15 EST 2007

On Sat, 2007-02-24 at 09:48 +0100, Dieter Maurer wrote:
> Roché Compaan wrote at 2007-2-23 22:00 +0200:
> > ...
> >Thanks for that pointer. It's good that way, it should make invalidation
> >easier. It could be as simple as invalidating any cached result that
> >contains the documentId being indexed. Do you see any problem with the
> >following invalidation strategy:
> >
> >If the 'documentId' exists (cataloging existing object), invalidate all
> >cached result sets that contain the documentId.
> >
> >If the 'documentId' doesn't exist (cataloging new object), invalidate
> >all result sets where the ids of indexes applied, are contained in the
> >cache key for that result set.
> 
> I see several problems:
> 
>   *  the RAMCacheManager does not provide an API to implement
>      this policy
> 
>   *  a cache manager would need a special data structure
>      to efficiently implement the policy (given a documentId,
>      find all cached results containing the documentId).

Can you elaborate. Would and IISet be efficient?

>   *  Apparently, your cache key contains the indexes involved
>      in producing the result.

This is coincidental. I'm building a cache key from all arguments passed
in as keyword arguments and on the REQUEST.

>      The problem with this is that these indexes are known
>      only after the query has been performed:
> 
>         The catalog API allows indexes to respond to subqueries,
>         that do not contain their own name.
> 
>         I use this feature to allow a "Managable RangeIndex"
> 	to transparently replace "effective, expires" queries.
> 
> 	But otherwise, the feature is probably not used
> 	intensively.

If these parameters are on the request or in keywords they will form
part of the cache key.

>      Of course, you can add the information *after*
>      the query has been performed and use it for invalidation -- in
>      a specialized cache manager.
> 
> 
>      On the other hand, new objects are usually indexed with
>      all available (and not only a few) indexes.
> 
>      While some of the indexes may not be able to determine
>      a senseful value for the document, the standard indexes
>      have problems to handle this properly ("ManagableIndex"es can)
>      and the API does not propagate the information.

I think it will not be trivial to implement invalidation that doesn't
bite you. I thought of checking for document ids because invalidating
results when a whole index changes might cause to many invalidations.
For example, querying for the same UID of an object should yield a
cached result most of the times. Indexing a new object's UID shouldn't
invalidate the cached results for existing UID queries.

Let's assume we have a specialised cache manager and a cache that copes
with the subtleties of sub queries, do think that the invaliding the
cache according to the logic I suggested would work? Can you think of
cases where it can lead to stale results that one should guard against.

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za