opinion: speeding up large PUT uploads
Zope's ZPublisher.HTTPRequest.HTTPRequest class has a method named "processInputs". This method is responsible for parsing the body of all requests. It parses all upload bodies regardless of method: PUT, POST, GET, HEAD, etc. In doing so, it uses Python's FieldStorage module to potentially break apart multipart/* bodies into their respective parts. Every invocation of FieldStorage creates a tempfile that is a copy of the entire upload body. So in the common case, when a large file is uploaded via HTTP PUT (both DAV and external editor use PUT exclusively), here's what happens: - ZServer creates a tempfile T1 to hold the file body as it gets pulled in. - When the request makes it to the publisher, processInputs is called and it hands off tempfile T1 to FieldStorage. - FieldStorage reads the entire body and creates another tempfile T2 (an exact copy of T1*, in the case of a PUT request). - T2 is eventually put into REQUEST['BODYFILE']. (*) At least I can't imagine a case where it's not an exact copy. This is costly on large uploads. I'd like to change the top of the processInputs method to do this: if method == 'PUT': # we don't need to do any real input processing if we are # handling a PUT request. self._file = self.stdin return Can anyone think of a reason I shouldn't do this? - C
Chris McDonough <chrism@plope.com> wrote:
Zope's ZPublisher.HTTPRequest.HTTPRequest class has a method named "processInputs". This method is responsible for parsing the body of all requests. It parses all upload bodies regardless of method: PUT, POST, GET, HEAD, etc. In doing so, it uses Python's FieldStorage module to potentially break apart multipart/* bodies into their respective parts. Every invocation of FieldStorage creates a tempfile that is a copy of the entire upload body.
So in the common case, when a large file is uploaded via HTTP PUT (both DAV and external editor use PUT exclusively), here's what happens:
- ZServer creates a tempfile T1 to hold the file body as it gets pulled in.
- When the request makes it to the publisher, processInputs is called and it hands off tempfile T1 to FieldStorage.
- FieldStorage reads the entire body and creates another tempfile T2 (an exact copy of T1*, in the case of a PUT request).
- T2 is eventually put into REQUEST['BODYFILE'].
(*) At least I can't imagine a case where it's not an exact copy.
This is costly on large uploads. I'd like to change the top of the processInputs method to do this:
if method == 'PUT': # we don't need to do any real input processing if we are # handling a PUT request. self._file = self.stdin return
Can anyone think of a reason I shouldn't do this?
Is stdin the medusa stream or T1 at this point ? Because for ConflictError retry we need an input that is seekable (HTTPRequest.retry does self.stdin.seek(0)). Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Chris McDonough wrote at 2005-4-3 18:14 -0400:
... So in the common case, when a large file is uploaded via HTTP PUT (both DAV and external editor use PUT exclusively), here's what happens:
- ZServer creates a tempfile T1 to hold the file body as it gets pulled in.
- When the request makes it to the publisher, processInputs is called and it hands off tempfile T1 to FieldStorage.
- FieldStorage reads the entire body and creates another tempfile T2 (an exact copy of T1*, in the case of a PUT request).
- T2 is eventually put into REQUEST['BODYFILE'].
(*) At least I can't imagine a case where it's not an exact copy.
This is costly on large uploads. I'd like to change the top of the processInputs method to do this:
if method == 'PUT': # we don't need to do any real input processing if we are # handling a PUT request. self._file = self.stdin return
Can anyone think of a reason I shouldn't do this?
Even a "PUT" may get a multipart entity. At least, the HTTP specification does not tell anything to the contrary. Otherwise, (working) optimizations are of course welcome... -- Dieter
On Mon, 2005-04-04 at 14:27, Dieter Maurer wrote:
Even a "PUT" may get a multipart entity.
But it never actually does in practice. Or if it does, I've never seen it. And if it did, would an implementation just store the multipart-encoded body? I suppose it could do anything, but it seems like it could be rather general and useless to allow multipart PUT bodies especially given that no one has seemed to need it in the last six years. That's what POST is for.
At least, the HTTP specification does not tell anything to the contrary.
No, it doesn't.
Otherwise, (working) optimizations are of course welcome...
This one works. ;-) - C
Hello, If you look above I had problems with zope creating temp files, as I am using Mac OS X and Webdav to Zope mounted on the same machine. There is some race condition on locks in mach kernel, and sometimes zope dies, as open system call never returns. I had two choices, one to fix Darwin kernel, and one to make TempFile use StringIO, which worked wonderfully. You can override make_file of FieldStorage to return StringIO(), or cStringIO(). Now there are no problems with locked out files, for me. laters, pavel
On Wed, 2005-04-06 at 00:45, Pavel Zaitsev wrote:
Hello, If you look above I had problems with zope creating temp files, as I am using Mac OS X and Webdav to Zope mounted on the same machine. There is some race condition on locks in mach kernel, and sometimes zope dies, as open system call never returns.
That sounds bad. I'm surprised you've had so much trouble with this. I thought OS X was just BSD, and BSD works fine?
I had two choices, one to fix Darwin kernel, and one to make TempFile use StringIO, which worked wonderfully. You can override make_file of FieldStorage to return StringIO(), or cStringIO(). Now there are no problems with locked out files, for me.
Well, that's good, but large uploads require a tempfile. - C
Chris McDonough <chrism@plope.com> wrote:
On Wed, 2005-04-06 at 00:45, Pavel Zaitsev wrote:
If you look above I had problems with zope creating temp files, as I am using Mac OS X and Webdav to Zope mounted on the same machine. There is some race condition on locks in mach kernel, and sometimes zope dies, as open system call never returns.
That sounds bad. I'm surprised you've had so much trouble with this. I thought OS X was just BSD, and BSD works fine?
The webdav kernel drivers are known to have quite a number of problems in Mac OS X 10.3 (don't know about upcoming Tiger). Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
participants (4)
-
Chris McDonough -
Dieter Maurer -
Florent Guillaume -
Pavel Zaitsev