How To Improve Cache Coherency for RAM/Disk Cache Manager...?
Hello: I am getting ready to release the next version of XMLTransform, and in revisiting the Caching strategy for the product, I realized there are larger issues that probably deserve a discussion here. The bottom line is that transforming XML to something else via XSLT is a potentially expensive operation, so that caching the results is often worthwhile. As I thought about the problem, I realized that this probably holds true for any sufficiently dynamic site where you employ caching because: - the cost of processing exceeds the cost of retrieval from cache by at least an order of magnitude - there are many more readers than writers Question: How can we ensure cache coherency? For example, you might have a ZPT that includes the results of several long-running PythonScripts, whose rendered result is cached. What happens when the code for those PythonScripts changes? Worse, what happens when the SQL data retrieved by the Z SQL Method that the PythonScripts operate on changes? Correct me if I am wrong, but today the Zope Cache Management facility takes into account changes in cached objects, but not objects on which they depend. One strategy for accounting for this problem is to invalidate objects in the cache based on a certain interval. That way objects are out of date for at most the length of the interval. This could be called the "pull" or reactive model. Alternatively, Cacheable objects might be somehow aware of the objects on which they depend, and invalidate themselves in the cache when one of their dependent objects changes. This could be called the "push" or proactive model. Depending on some parameters, they might even recalculate their results proactively so they could be re-cached immediately. Why make the unlucky user pay the price? The latter alternative is not infeasible. DTML and ZPT scripts must parse their contents in order to render, so the information is available somewhere. The question is: can/should this be addressed for Zope2? What about Zope3? Is one model more appropriate for a development setting (equal numbers of writers and readers) vs production (many more readers, few or no writers)? Any thoughts or sage advice on this topic would be much appreciated! Regards, --Craeg
Jamie Heilman wrote:
Why make the unlucky user pay the price?
Because the unlucky user (which I read as: author) is the only one who knows the required behavior of their code.
Allow me to clarify -- I meant the end user browsing the website. I hate it when I surf to a less-highly-used portion of a website and have to wait 30 seconds for the page to render. Anyway, after talking this over with my colleague, I realize that the problem of *deriving* dependencies is fundamentally undecidable. We might be able to figure it out in the case of simple acquisition, like <span tal:replace="here/aObject/aMethod"/> But it is hopeless for pure python: <span tal:replace="python:I-can-do-anything-and-you-cant-stop-me(REQUEST)"/> :) One possibility is to add the ability to *declare* dependencies. I thought of doing that for an X-path aware "XML Composite Document" object. It is an object that produces a valid XML document by grabbing parts of other Zope objects via XPath/XPointer/XInclude. Such an object could have an explicit listing of the objects on which it depends, and invalidate the cache appropriately whenever any of the dependent objects changed. http://www.zope.org/Members/faassen/SimpleCache Is a nice beginning along these lines. Thoughts? --Craeg
Craeg K Strong wrote:
I hate it when I surf to a less-highly-used portion of a website and have to wait 30 seconds for the page to render.
Sure, we all do.
One possibility is to add the ability to *declare* dependencies. [snip] Such an object could have an explicit listing of the objects on which it depends, and invalidate the cache appropriately whenever any of the dependent objects changed.
Unfortunately enumerating the dependencies of a document won't help with the aforementioned first-time-rendering delay. Prefetching can, not that I'm advocating it, I offer that only as an observation. That said I don't think a dependancy based caching strategy is a bad idea. It could obviate the need for time-based cache expiration in some circumstances. In the long run, it all depends on your usage patterns as to whether it would pay off or not. -- Jamie Heilman http://audible.transient.net/~jamie/ "Most people wouldn't know music if it came up and bit them on the ass." -Frank Zappa
Anyway, after talking this over with my colleague, I realize that the problem of *deriving* dependencies is fundamentally undecidable. We might be able to figure it out in the case of simple acquisition, like <span tal:replace="here/aObject/aMethod"/> But it is hopeless for pure python:
<span tal:replace="python:I-can-do-anything-and-you-cant-stop-me(REQUEST)"/> :)
Well you could, in theory, hook every object as CallProfiler does and then you would know for each request what object was called and Cache it. You could even do something really clever like using CallProfiler automatically cache objects that took longer than a certain amount of time... But there are more issues with that than there are days in a year and you could be writing that code forever, letting the user figure it out manually is an easier choice. Cheers. -- Andy McKay
Andy McKay wrote:
Anyway, after talking this over with my colleague, I realize that the problem of *deriving* dependencies is fundamentally undecidable. We might be able to figure it out in the case of simple acquisition, like <span tal:replace="here/aObject/aMethod"/> But it is hopeless for pure python:
<span tal:replace="python:I-can-do-anything-and-you-cant-stop-me(REQUEST)"/> :)
Well you could, in theory, hook every object as CallProfiler does and then you would know for each request what object was called and Cache it. You could even do something really clever like using CallProfiler automatically cache objects that took longer than a certain amount of time...
But there are more issues with that than there are days in a year and you could be writing that code forever, letting the user figure it out manually is an easier choice.
Ah, but you might have something there. What if there were a cache manager that simply dropped its contents whenever anything changes in ZODB? You could associate nearly all scripts and templates with that cache manager without any fear of stale cache entries. For many sites, it could be an instant win. Shane
On Tue, Mar 04, 2003 at 02:32:31PM -0500, Shane Hathaway wrote:
Ah, but you might have something there. What if there were a cache manager that simply dropped its contents whenever anything changes in ZODB? You could associate nearly all scripts and templates with that cache manager without any fear of stale cache entries. For many sites, it could be an instant win.
interesting idea. there's certainly plenty of sites, though, where the cache would get invalidated so often that the cache would be of limited value. e.g. a busy squishdot-type site, or many CMF sites. and on those kind of sites, the busiest times are when you most need the cacheing... but the simplicity is certainly appealing... and in my case, i have a CMF site where this would likely be quite useful since the public never logs in, only our content management team, and we tend to make changes on dev servers and push them to production in a big bunch. -- Paul Winkler http://www.slinkp.com
participants (5)
-
Andy McKay -
Craeg K Strong -
Jamie Heilman -
Paul Winkler -
Shane Hathaway