Using Zope to manage and serve large files
Often when my customers have got used to Zope and especially Plone as the UI to manage their website, questions go to using Zope and Plone for document management in intranet. It is old story that we all know how Zope sucks while serving large files compared to Apache. Serving large files from filesystem with Apache however takes out the whole point of using Zope - security mechanism etc. Has anyone achieved to combine Zope and Apache/php to serve and manage large files in a way that allows metadata and security of the files to be managed in Zope, but actual serving happens from Apache so that ZServer does not hog memory and processor. I thought about the possibility to integrate Apache/php and Zope via a database and tokens that are passed from Zope to Apache. The scenario could be following. 1) Apache and Zope are serving data from the same domain. Zope could be www.myfoobar.com and Apache could serve download.myfoobar.com 2) within the portal I would have a portal_external_storage_tool that defines the configuration and methods for storing and serving data outside Zope, including the database that is shared 3) Content types are own content types based on Archetypes and include logic for handing over the download to Apache 4) When user within the portal wants to download a file, the view / download function of the content type ( that is protected with Zope security machinery ) writes into the databse a token based on user, time, file or so. 5) User is transparently directed to the application in the download host to download the file. Token would be passed within the url 6) Download application checks that the token is valid for that file and that time - and starts serving data to the client. Has anyone done anything similar - or do you have other ideas on how to implement such? -huima
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Heimo Laukkanen wrote: | Often when my customers have got used to Zope and especially Plone as | the UI to manage their website, questions go to using Zope and Plone for | document management in intranet. It is old story that we all know how | Zope sucks while serving large files compared to Apache. Serving large | files from filesystem with Apache however takes out the whole point of | using Zope - security mechanism etc. | | Has anyone achieved to combine Zope and Apache/php to serve and manage | large files in a way that allows metadata and security of the files to | be managed in Zope, but actual serving happens from Apache so that | ZServer does not hog memory and processor. I was just reading this: http://www.zope.org/Members/guy_davis/install_routes You don't really have to use zserver to use zope - maybe this is what you want: using pcgi/mod_pcgi. - -- Robin Y. Millette (aka Lord D. Nattor) http://rym.waglo.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr2 (Windows 2000) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE/otsHIpq5HjFcVUMRAmNjAJ0duk4Glw97tMOj/UTmwPdKtoTbzACgzdvB WlshttSsQ7ugqHlA6+OOSWU= =63aL -----END PGP SIGNATURE-----
On Fri, 31 Oct 2003 23:47:36 +0200 Heimo Laukkanen <huima@iki.fi> wrote:
Often when my customers have got used to Zope and especially Plone as the UI to manage their website, questions go to using Zope and Plone for document management in intranet. It is old story that we all know how Zope sucks while serving large files compared to Apache. Serving large files from filesystem with Apache however takes out the whole point of using Zope - security mechanism etc.
Has anyone achieved to combine Zope and Apache/php to serve and manage
large files in a way that allows metadata and security of the files to
be managed in Zope, but actual serving happens from Apache so that ZServer does not hog memory and processor.
Well, the main problem is simply making sure that the customer cannot possibly guess the filename. So, use the Secure Hash Algorithm (SHA) and your own site-based secrets. That is, given a filename, calculate SHA(secret_1 + file_name + customer_name + secret_2). Save the file in a customer specific (apache accessible) directory, using the SHA as the filename. Then put a dummy index.html in that folder, something like <html><head></head><body>No Peeking!</body></html> Now, the customer something like 1 in 2^160 chance of finding the file by probing; you have something that is fast and fairly cheap to calculate. The only practical way to get to a file is by something that knows the secrets, and can do the right calculation. Make this calculation a part of zope, and you have your security bottleneck. That is, use zope to authenticate and authorize, calculate the SHA, and present either a direct link, or redirect to the actual file. You do have to worry about the site secrets. If they are ever exposed, you would have to rehash the names of all files, but that is not too big a deal to do periodically, anyway. It does mean that you need to keep a database of customer file names. Do not put the SHA of associated with the file name in the database. Calculate from scratch every time. Jim Penny
On Oct 31, 2003, at 4:21 PM, Jim Penny wrote:
On Fri, 31 Oct 2003 23:47:36 +0200 Heimo Laukkanen <huima@iki.fi> wrote: Well, the main problem is simply making sure that the customer cannot possibly guess the filename. So, use the Secure Hash Algorithm (SHA) and your own site-based secrets.
That is, given a filename, calculate SHA(secret_1 + file_name + customer_name + secret_2). Save the file in a customer specific (apache accessible) directory, using the SHA as the filename. Then put a dummy index.html in that folder, something like <html><head></head><body>No Peeking!</body></html>
You could also use this signing process to allow a CGI script to download the file. Like you redirect the user to a URL like /downloader.cgi? customer=custname&filename=somefile.html&time=103943&auth=934a975f Where time is the result of int(time.time()), and auth is SHA(customer_name + filename + str(int(time.time())) + secret). Then the CGI script confirms the SHA, makes sure the timestamp isn't too old, and then lets the user download the file. The actual files are kept outside of Apache's docroot, but readable by the CGI script. Somewhat higher overhead because of CGI vs. pure Apache, but CGI scales fine for large files. Unlike using filenames, users won't be able to bookmark or link to file locations, and they won't be able to get any caching. This could be good or bad. If you really wanted caching, you'd put the timestamp and signature in a cookie, then call something like /downloader.cgi/custname/filename. You'd still have to handle caching in your script (I think setting modification-date(?) and checking if-modified-since would be enough). The result would be kind of weird, though -- browsers could cache files, but they couldn't bookmark them, and the reliability of passing authentication information like that in cookies isn't very good. Using URL parameters is definitely more reliable. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
participants (4)
-
Heimo Laukkanen -
Ian Bicking -
Jim Penny -
Robin Y. Millette