On Sat, 2007-02-24 at 09:48 +0100, Dieter Maurer wrote:
Roché Compaan wrote at 2007-2-23 22:00 +0200:
... Thanks for that pointer. It's good that way, it should make invalidation easier. It could be as simple as invalidating any cached result that contains the documentId being indexed. Do you see any problem with the following invalidation strategy:
If the 'documentId' exists (cataloging existing object), invalidate all cached result sets that contain the documentId.
If the 'documentId' doesn't exist (cataloging new object), invalidate all result sets where the ids of indexes applied, are contained in the cache key for that result set.
I see several problems:
* the RAMCacheManager does not provide an API to implement this policy
* a cache manager would need a special data structure to efficiently implement the policy (given a documentId, find all cached results containing the documentId).
Can you elaborate. Would and IISet be efficient?
* Apparently, your cache key contains the indexes involved in producing the result.
This is coincidental. I'm building a cache key from all arguments passed in as keyword arguments and on the REQUEST.
The problem with this is that these indexes are known only after the query has been performed:
The catalog API allows indexes to respond to subqueries, that do not contain their own name.
I use this feature to allow a "Managable RangeIndex" to transparently replace "effective, expires" queries.
But otherwise, the feature is probably not used intensively.
If these parameters are on the request or in keywords they will form part of the cache key.
Of course, you can add the information *after* the query has been performed and use it for invalidation -- in a specialized cache manager.
On the other hand, new objects are usually indexed with all available (and not only a few) indexes.
While some of the indexes may not be able to determine a senseful value for the document, the standard indexes have problems to handle this properly ("ManagableIndex"es can) and the API does not propagate the information.
I think it will not be trivial to implement invalidation that doesn't bite you. I thought of checking for document ids because invalidating results when a whole index changes might cause to many invalidations. For example, querying for the same UID of an object should yield a cached result most of the times. Indexing a new object's UID shouldn't invalidate the cached results for existing UID queries. Let's assume we have a specialised cache manager and a cache that copes with the subtleties of sub queries, do think that the invaliding the cache according to the logic I suggested would work? Can you think of cases where it can lead to stale results that one should guard against. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za