Caching Catalog results for performance
Hi all, I am working on a portal where we want to cache some catalog queries per person for some time, from few seconds to few minutes - since the data is something that does not need to be updated every single pageload, but can be cached for seconds - sometimes even for multiple minutes per user. What we have been thinking so har is this: - Cached data will be stored in ram in Temporary folder in simple object that will store data in it's attribute. Memory is not an issue for now, and if it will become - cached data will be stored in ZODB, rather than kept all the time in the memory. - Since brains are unpickleable we will create a dictionary from the brain by getting schema from the Catalog and then just getting each metadatafield from the brain Has anyone else done similar or thought about doing it? -- -huima
Zcatalog is pretty quick with the built-in caching. Have you encountered performance problems? Have you tried tweaking the cache settings? (the default cache size is pretty small, if RAM is not a problem, increase the 'Target number of objects in memory per cache' setting which is located in Control Panel - Database Management - Cache Parameters - for our system we use a setting of 10000 which gives us pretty good performance for a zcatalog containing about 700,000 items). Jonathan ----- Original Message ----- From: "Heimo Laukkanen" <huima@iki.fi> To: <zope@zope.org> Sent: February 24, 2004 9:30 AM Subject: [Zope] Caching Catalog results for performance
Hi all,
I am working on a portal where we want to cache some catalog queries per person for some time, from few seconds to few minutes - since the data is something that does not need to be updated every single pageload, but can be cached for seconds - sometimes even for multiple minutes per user.
What we have been thinking so har is this:
- Cached data will be stored in ram in Temporary folder in simple object that will store data in it's attribute. Memory is not an issue for now, and if it will become - cached data will be stored in ZODB, rather than kept all the time in the memory.
- Since brains are unpickleable we will create a dictionary from the brain by getting schema from the Catalog and then just getting each metadatafield from the brain
Has anyone else done similar or thought about doing it?
-- -huima
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Heimo Laukkanen wrote:
I am working on a portal where we want to cache some catalog queries per person for some time, from few seconds to few minutes - since the data is something that does not need to be updated every single pageload, but can be cached for seconds - sometimes even for multiple minutes per user.
How about storing the "data to be cached" in the SESSION? -mj
Heimo Laukkanen wrote at 2004-2-24 16:30 +0200:
... caching for ZCatalog searches ... Has anyone else done similar or thought about doing it?
I have implemented caching. I am using a dictionary (keyed by paths) of dictionaries (keyed by textual query representations; values are triples [searchTime, lastAccessTime, result]) in a module namespace. "searchTime" is used to ignore outdated results, "lastAccessTime" is used to flush old cache entries. Drawback: the cache cannot be (easily) controlled across ZEO clients. -- Dieter
On Tue, 24 Feb 2004 21:21:21 +0100, Dieter Maurer <dieter@handshake.de> wrote:
I have implemented caching. I am using a dictionary (keyed by paths) of dictionaries (keyed by textual query representations; values are triples [searchTime, lastAccessTime, result]) in a module namespace. "searchTime" is used to ignore outdated results, "lastAccessTime" is used to flush old cache entries.
Thanks for the tip. I haven't used module namespace before, though I read in ZODB docs about possibility to use either volatile attributes or module namespace. Are there - besides ZEO - any issues that I should be aware of before doing it? And is there actually anything else in it than creating a module with functions that get and set data, and flush cached data.
Drawback: the cache cannot be (easily) controlled across ZEO clients.
One thing why I thought first about using cache objects instead of module namespace was the possibility to share cache between ZEO clients if temporary folder will be changed into persistent folder and mapped with dbtab according to the recipe found at zopelabs. http://www.zopelabs.com/cookbook/1061234337 Naturally this would make it slower than using module namespace, but it shouldn't it still be faster than doing new queries to catalog. Any thoughts on that. -- -huima
Heimo Laukkanen wrote at 2004-2-26 00:12 +0200:
... Thanks for the tip. I haven't used module namespace before, though I read in ZODB docs about possibility to use either volatile attributes or module namespace. Are there - besides ZEO - any issues that I should be aware of before doing it?
You must never put something in such variables that contains references to persistent objects. This means, you would not store the result of "searchResults" but the raw result of "search" (a sequences of "rid"s) and wrap them only when used (This is similar to the way Cache Controlled Z SQL Methods (http://www.dieter.handshake.de/pyprojects/zope) manage their cache).
And is there actually anything else in it than creating a module with functions that get and set data, and flush cached data.
You have 2 level structures. You may get race conditions but this is (almost surely) not an issue for a cache implementation.
Drawback: the cache cannot be (easily) controlled across ZEO clients.
One thing why I thought first about using cache objects instead of module namespace was the possibility to share cache between ZEO clients if temporary folder will be changed into persistent folder and mapped with dbtab according to the recipe found at zopelabs.
Yes, that is easier with ZODB based objects.
Naturally this would make it slower than using module namespace, but it shouldn't it still be faster than doing new queries to catalog.
Any thoughts on that.
It depends. Many queries are extremely fast. There is no need to cache them. Sorting large result sets is very expensive (this costs by fast the most ressources in our application). Caching is valuable for us for this reason. Therefore, we allow the query to specify whether it wants to get cached or not (and for how long). -- Dieter
On Wed, 25 Feb 2004 20:46:36 +0100, Dieter Maurer <dieter@handshake.de> wrote: As usual your replies are extremely informative, so thank you Dieter.
This means, you would not store the result of "searchResults" but the raw result of "search" (a sequences of "rid"s) and wrap them only when used (This is similar to the way Cache Controlled Z SQL Methods (http://www.dieter.handshake.de/pyprojects/zope) manage their cache).
I have to take look at that. At the moment my thinking was to store from search result brains the metadata that was needed, including the url to the object. With that cached data I would be able to create the needed views and would not need any object references or do special wrapping. Today I looked more closely how Zope handles session data in Temporary folder and in Transience Object container in Transience objects, and it looks that I could use the same technique and objects without inventing the wheel again. I'll be only creating a Portal tool to offer api for managing the cache and to hide the actual implementation, so that if needed it can be changed to something else. -- -huima
participants (4)
-
Dieter Maurer -
Heimo Laukkanen -
Maik Jablonski -
Small Business Services