Brainstorm: Zope behind proxying cache?
Hi, Been brainstorming a little while about modifying HTTP requests combined with a high speed cache in front of a dynamic source of data, such as Zope. Idea: Put a proxying cache with content negotiation, rewriting of requests etc. in front of Zope. First time a page is requested (client requests it from the proxy which knows where to get it because it's configuration files tell it how to handle a request) it is cached. The next request is served directly from the cache. As soon as a page changes it is fetched from Zope and cached again. It would also allow you to use any mix of static and dynamic content, because pages that stay the same are served directly from the cache and Zope is never hit to serve that page. Allows to use Zope as a content management tool too. So what's your view? I know that a request can be modified behind the scenes based on the data the client supplies - request, filetype, language, browser type, IP address etc. It's also possible to cache the files requested, but I am not sure how the cache would communicate with Zope and vice versa. The HTTP protocol offers some hooks to cache a file for a limited time only if I am not mistaken, but the ideal solution in my view would be a button in Zope called 'Publish' which would send a message to the cache resulting in the removal of that particular page... Just brainstorming here :) Anyone with suggestions, comments? Thnx Jonathan -- UR Communications - Solutions for a wired world Who, what & where @ http://www.ur.nl/
Jonathan wrote:
Been brainstorming a little while about modifying HTTP requests combined with a high speed cache in front of a dynamic source of data, such as Zope.
Idea: Put a proxying cache with content negotiation, rewriting of requests etc. in front of Zope. First time a page is requested (client requests it from the proxy which knows where to get it because it's configuration files tell it how to handle a request) it is cached. The next request is served directly from the cache. As soon as a page changes it is fetched from Zope and cached again.
The problem is: what is a page? For instance, if you got to http://www.zope.org/, you'll get a different page than me. Why? Because my page is personalized. However, the HTTP header Etags could help by giving each version of a page a unique id.
It would also allow you to use any mix of static and dynamic content, because pages that stay the same are served directly from the cache and Zope is never hit to serve that page. Allows to use Zope as a content management tool too.
The win these days is in smarter caching which is more finely-grained than at the page level.
So what's your view? I know that a request can be modified behind the scenes based on the data the client supplies - request, filetype, language, browser type, IP address etc. It's also possible to cache the files requested, but I am not sure how the cache would communicate with Zope and vice versa.
The HTTP protocol offers some hooks to cache a file for a limited time only if I am not mistaken, but the ideal solution in my view would be a button in Zope called 'Publish' which would send a message to the cache resulting in the removal of that particular page...
I _think_ that Squid has a protocol like this to allow caches to send messages to each other. --Paul
Paul Everitt wrote:
Jonathan wrote:
Idea: Put a proxying cache with content negotiation, rewriting of requests etc. in front of Zope.
[snip]
The problem is: what is a page?
[snip]
The win these days is in smarter caching which is more finely-grained than at the page level.
Exactly. If you're accustomed to thinking of a web site as a collection of HTML documents in a file system, squid/apache/roll-your-own caching sounds attractive. Your pages change only when you tell them to, which will be sporadically and infrequently (relative to page hits, if not absolutely). Once you really absorb the possibilities and practices of a tool like Zope or PHP, though, you begin to see your site as a collection of templates and interfaces, combining static snippets with data queries and views. It makes no more sense to cache many of the pages from such a site than it would to cache windows from an accounting program or word processor. You're moving into the land of the web application. As you take more and more advantage of the leverage that Zope gives you, you will realize that the *are* things you would like to cache, but they aren't pages any more. They're dribs and drabs of data, such as a menu generated by walking an object tree, or a database-driven bit of output which rarely changes but is relatively expensive to run. Typically, each such cacheable bit will be the value returned by a single method/object, possibly varying by parameters/context. Sometimes, as with the entry page of an active message board, you may want to provide customized views to each user, yet recompute the message summary only every minute or so, rather than with every hit. Other times, you may want to flush an item from the cache only if some 'upstream' object is modified. Then again, there are situations where only manual flushing will do, as it is impractical to try to automatically discover when the cache is stale. One way to handle this is to tag an object whose output you wish to cache with a set of rules, such as minimum or maximum cache lifetime, and to provide a 'flush from cache' method. Trying to automatically track dependencies is probably not workable, since acquisition and the request environment provide so many sources for variable data. On the other hand, with careful design, it may be possible to specify a set of values or a formula which can be used as a cache key for particular objects. This key could be computed by a user-defined method, or provided as an expression list if it's simple enough. Often several or many objects share common cache characteristics. They may depend in the same inputs, or simply have the same 'freshness' requirements. Rather than attach cache settings to particular objects, it might be a good idea to attach them to Cache Policy objects, and simply assign cacheable objects a Policy. This is roughly similar to the ZSQL Method/Database Connector division of labor. A single call to a Policy method could clear the cache of all objects with that Policy, and a cache key method/formula might only need to be calculated once. I'm not sure how Policies should best be assigned to objects. One way would be to provide Cache containers which encapsulate the objects to be cached. Another is to make some classes cache-aware, just as they can be ZCatalog-aware. Yet another is to provide Cache Manager objects, which can control cache Policy assignments for sibling objects. One of these days, I may care enough about performance to set this down in code, but not yet. Python underlies Zope, and it's philosophy on the subject is a good one: solve the problem now with clear, well-chosen algorithms and only worry about 'optimizing' if performance measurable suffers. If you try to guess where to optimize in advance, you'll probably waste your time and produce gnarly, bug-enhanced code. Cheers, Evan @ 4-am
Once you really absorb the possibilities and practices of a tool like Zope or PHP, though, you begin to see your site as a collection of templates and interfaces, combining static snippets with data queries and views. It makes no more sense to cache many of the pages from such a site than it would to cache windows from an accounting program or word processor. You're moving into the land of the web application.
The client still gets a page, so it depends on the kind of content you have and the functionality you want to offer to visitors. If the main reason you need a dynamic tool is because you are working on the same site with a number of different people, you'll need advanced content management tools as soon as it is finished it can be static, no problem there. When it changes, republish it. If you need a dynamic tool because you've got a lot of data that is poured into a template when a client requests a page or queries the database through a form, then it makes sense to generate a page on the fly everytime someone requests it. The major difference between a web application and one that runs on the local machine is speed. The code necessary for drawing windows on a screen is cached in RAM on the local machine and therefore very fast. A page is not. Cya Jonathan -- UR Communications - Solutions for a wired world Who, what & where @ http://www.ur.nl/
I _think_ that Squid has a protocol like this to allow caches to send messages to each other.
Squid speaks ICP and cache digests, both of which are used for requesting objects from other caches if the cache does not have the requested object. I suppose you could hack zope to speak ICP and pretend that it is a cache. Every time that the cache gets a request for an object it doesn't have, it would request it from the zope server. However, there is no way to explicitly expire an object out of a cache using ICP or cache digests. Additionally, they are both fairly resource intensive protocols. Each ICP request is a UDP message sent to the cache (zope), which responds positively or negatively (always positive in zope). The cache then sends an http request for the object. This means that zope has to look for the object (and traverse the acquisition heirarchy, etc) twice for every object. Cache Digests don't send so many requests, but each cache keeps a digest of EVERY object stored in EVERY cache (although in a highly compressed manner). It really isn't very appropriate You could set up zope as a parent cache to the real cache. This means that every cache miss would get forwarded to zope, without a preceding icp request. This would cause the cache to service all static requests, but there is still no way to tell a cache to 'throw an object away right now.' All you can do is tell it when to throw the object away the very first time that you serve it. HTTP is a client driven protocol. There is no standard way for a server to contact a client preemptively. See my previous email about Akamai for a method using checksums, that would cause an object to have a different URL every time you change the object. If you could teach zope to ignore the checksum segment of the URL, you could easily set up akamaization locally. A pretty good idea, if you ask me.
--Paul
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
I am in the process of working on an Akamaization product for Zope. If you are unaware, Akamai is a company that has a very high speed worldwide network. They offer (for a fee), the ability to store all of your static objects on their network. You modify the URL in the html documents that reference those objects, so that the URL is in a special format. The browser is then able to get the object from the closest available server, which is usually not farther than your ISPs local point of presence. Generally, people only store images, midi files, wav files, etc on the akamai network, but there is no reason not to put html files there as well. There is no actual upload process. You just change the URL, and the first time somebody requests it from the akamai network, it is retrieved from your web server and distributed across their network. Every time an object is modified, the special URL is changed (the url includes the checksum). This means that every time you modifiy an object, the akamai network will pick up the change. My intent was to just create some simple subclasses of Image and File in zope. Each would have, in addition to the url method, an akamai url. When you include the image or file in your document, you would just use a <dtml-var image.akamaiurl> tag to get the akamai tag. Additionally, it will be possible to set a property (that only needs to be available via acquisition) that says whether to sue the akamai url or the normal url. That way, when you are developing the site, you can use the akamaiurl tag, but still retrieve the images from the local zope server. When you publish the site, you disable debug mode, and all akamai urls will be sent correctly. I should have time to actually do the work by the end of the month (since I have a client that needs it by Feb 17). However, I have no idea how much Akamai charge for the service. -sam Jonathan wrote:
Hi,
Been brainstorming a little while about modifying HTTP requests combined with a high speed cache in front of a dynamic source of data, such as Zope.
Idea: Put a proxying cache with content negotiation, rewriting of requests etc. in front of Zope. First time a page is requested (client requests it from the proxy which knows where to get it because it's configuration files tell it how to handle a request) it is cached. The next request is served directly from the cache. As soon as a page changes it is fetched from Zope and cached again.
It would also allow you to use any mix of static and dynamic content, because pages that stay the same are served directly from the cache and Zope is never hit to serve that page. Allows to use Zope as a content management tool too.
So what's your view? I know that a request can be modified behind the scenes based on the data the client supplies - request, filetype, language, browser type, IP address etc. It's also possible to cache the files requested, but I am not sure how the cache would communicate with Zope and vice versa.
The HTTP protocol offers some hooks to cache a file for a limited time only if I am not mistaken, but the ideal solution in my view would be a button in Zope called 'Publish' which would send a message to the cache resulting in the removal of that particular page...
Just brainstorming here :) Anyone with suggestions, comments?
Thnx Jonathan
-- UR Communications - Solutions for a wired world Who, what & where @ http://www.ur.nl/
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
participants (4)
-
Evan Simpson -
Jonathan -
Paul Everitt -
Sam Gendler