speeding up multiple file upload
Our just finished project implements a product which includes a domain specific repository-like thing what have to handle multiple (100-1000) file uploads. In a new project, we will implement a more general version of this repository as a standalone product. The old one works quite well but there's a lot of room for improving performance in which I seek your advices now. The relevant part of the old stuff is a client-server application. The following steps are made in a transaction: - client generates a filelist, sends it to the server [xml-rpc] - server determines which files should be uploaded (changed files) sends back the result [xml-rpc] - client collects all the files into a large zip (they are more or less regular web content, well compressible), sends to the server [http multipart/form-data] - server unzips and stores the files in ZODB All communication is done via ssl, so the connect-disconnect operation is quite expensive. The two longest operation is the upload and the unzip/store and theoretically these could be done in an asynchronous consumer-producer setup. I'm looking for something like network copy of multiple files via tar, e.g.: tar -xzf- | ssh remoteserver "tar -czf-" The aim is to send all the files over the same channel, one by one, and start server side processing as soon as the first file arrived. Now, this is completely doable with pure python, the question is how should I do this when Zope is the receiver? Both the client and the server part is under our control so any wild idea would work, the only restriction is the all communtication must go over port 443, because both the clients and the server are firewalled. Any hints or pointers would be appreciated. Regards, Sandor
If you can do all the sending of files with ssh and python then do that. Your server should then accept the files of ssh (not https) and from there call its zope on localhost. Have you looked at the load_site.py script that comes with Zope? My suggestion is that no files should be sent to or from Zope over the internet. Zope should talk to the client about which filenames to upload, and then the server uploads to Zope locally on http. Basically, you don't want to send big files over https from a client to Zope. zope@netchan.cotse.net wrote:
Our just finished project implements a product which includes a domain specific repository-like thing what have to handle multiple (100-1000) file uploads. In a new project, we will implement a more general version of this repository as a standalone product. The old one works quite well but there's a lot of room for improving performance in which I seek your advices now. The relevant part of the old stuff is a client-server application. The following steps are made in a transaction: - client generates a filelist, sends it to the server [xml-rpc] - server determines which files should be uploaded (changed files) sends back the result [xml-rpc] - client collects all the files into a large zip (they are more or less regular web content, well compressible), sends to the server [http multipart/form-data] - server unzips and stores the files in ZODB
All communication is done via ssl, so the connect-disconnect operation is quite expensive. The two longest operation is the upload and the unzip/store and theoretically these could be done in an asynchronous consumer-producer setup. I'm looking for something like network copy of multiple files via tar, e.g.: tar -xzf- | ssh remoteserver "tar -czf-" The aim is to send all the files over the same channel, one by one, and start server side processing as soon as the first file arrived. Now, this is completely doable with pure python, the question is how should I do this when Zope is the receiver? Both the client and the server part is under our control so any wild idea would work, the only restriction is the all communtication must go over port 443, because both the clients and the server are firewalled. Any hints or pointers would be appreciated.
Regards, Sandor
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
.
-- Peter Bengtsson, http://www.peterbe.com
zope@netchan.cotse.net wrote at 2004-3-9 11:26 -0600:
... All communication is done via ssl, so the connect-disconnect operation is quite expensive.
You can make several (many) requests via the same HTTP connection. -- Dieter
participants (3)
-
Dieter Maurer -
Peter Bengtsson -
zope@netchan.cotse.net