On Fri, Dec 05, 2003 at 10:31:08AM +0100, Sebastian Krollmann wrote:
Hi zopistas,
I need to access large textfiles (~120Mb) from zope. I know the python lager file support and that it is better to keep large files out of the ZODB. I have to do a full text search on these files residing in a folder hierachy on the server, show their content around the location of the found string and allow scrolling in that files source from zope.
Has anybody done something similar to this with that lager files and would share his experiences? Are there any do's and don'ts or best ways to do it?
I think you will find that serving a 120 mb object through zope will cripple your performance. Zope is reeeeallly slow with large chunks of data. A couple of concurrent downloads of 100 MB files can cause your site to crawl for all users. However, there are a couple of ways you could store and index the text files in zope but avoid having the users hit zope to download them. I'm experimenting with FSCacheManager (downloadable from cvs on collective.sf.net) which does "funky caching" in conjunction with an apache rule. Apache tries to serve the file directly from the filesystem. If it doesn't exist, apache then forwards the request to zope. The FSCacheManager causes the file to be stored to the filesystem each time it's hit in zope. Once a file is on the filesystem, zope won't see further requests for it. This works fine and it's very easy to set up. The big limitation is that, once the file is on the filesystem, it's available to all ... zope authorization is never checked again. Also you can't really control life of the cache but that may not be an issue. You could do something similar with Squid filesystem cacheing, which IIRC can be configured to request authorization from zope each time someone downloads the file, and clean out the cache according to some policy. Of course, you'll need a lot of disk space either way, but who cares? In either case, the first download will still be slow but you can prevent that by using wget or similar to "prime" the cache during off-hours. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's FLYING ACTION HERO! (random hero from isometric.spaceninja.com)