Chris McDonough <chrism@plope.com> wrote:
Zope's ZPublisher.HTTPRequest.HTTPRequest class has a method named "processInputs". This method is responsible for parsing the body of all requests. It parses all upload bodies regardless of method: PUT, POST, GET, HEAD, etc. In doing so, it uses Python's FieldStorage module to potentially break apart multipart/* bodies into their respective parts. Every invocation of FieldStorage creates a tempfile that is a copy of the entire upload body.
So in the common case, when a large file is uploaded via HTTP PUT (both DAV and external editor use PUT exclusively), here's what happens:
- ZServer creates a tempfile T1 to hold the file body as it gets pulled in.
- When the request makes it to the publisher, processInputs is called and it hands off tempfile T1 to FieldStorage.
- FieldStorage reads the entire body and creates another tempfile T2 (an exact copy of T1*, in the case of a PUT request).
- T2 is eventually put into REQUEST['BODYFILE'].
(*) At least I can't imagine a case where it's not an exact copy.
This is costly on large uploads. I'd like to change the top of the processInputs method to do this:
if method == 'PUT': # we don't need to do any real input processing if we are # handling a PUT request. self._file = self.stdin return
Can anyone think of a reason I shouldn't do this?
Is stdin the medusa stream or T1 at this point ? Because for ConflictError retry we need an input that is seekable (HTTPRequest.retry does self.stdin.seek(0)). Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com