Re: [Zope-dev] Re: Caching ZCatalog results

25 Feb 2007

      On Sat, 2007-02-24 at 09:48 +0100, Dieter Maurer wrote:
...
Roché Compaan wrote at 2007-2-23 22:00 +0200:
...
...
Thanks for that pointer. It's good that way, it should make invalidation
easier. It could be as simple as invalidating any cached result that
contains the documentId being indexed. Do you see any problem with the
following invalidation strategy:
If the 'documentId' exists (cataloging existing object), invalidate all
cached result sets that contain the documentId.
If the 'documentId' doesn't exist (cataloging new object), invalidate
all result sets where the ids of indexes applied, are contained in the
cache key for that result set.
I see several problems:
*  the RAMCacheManager does not provide an API to implement
     this policy
*  a cache manager would need a special data structure
     to efficiently implement the policy (given a documentId,
     find all cached results containing the documentId).
Can you elaborate. Would and IISet be efficient?
...
*  Apparently, your cache key contains the indexes involved
     in producing the result.
This is coincidental. I'm building a cache key from all arguments passed
in as keyword arguments and on the REQUEST.
...
The problem with this is that these indexes are known
     only after the query has been performed:
The catalog API allows indexes to respond to subqueries,
        that do not contain their own name.
I use this feature to allow a "Managable RangeIndex"
  to transparently replace "effective, expires" queries.
But otherwise, the feature is probably not used
  intensively.
If these parameters are on the request or in keywords they will form
part of the cache key.
...
Of course, you can add the information *after*
     the query has been performed and use it for invalidation -- in
     a specialized cache manager.
On the other hand, new objects are usually indexed with
     all available (and not only a few) indexes.
While some of the indexes may not be able to determine
     a senseful value for the document, the standard indexes
     have problems to handle this properly ("ManagableIndex"es can)
     and the API does not propagate the information.
I think it will not be trivial to implement invalidation that doesn't
bite you. I thought of checking for document ids because invalidating
results when a whole index changes might cause to many invalidations.
For example, querying for the same UID of an object should yield a
cached result most of the times. Indexing a new object's UID shouldn't
invalidate the cached results for existing UID queries.

Let's assume we have a specialised cache manager and a cache that copes
with the subtleties of sub queries, do think that the invaliding the
cache according to the logic I suggested would work? Can you think of
cases where it can lead to stale results that one should guard against.

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za