Hi all, ZODB.blob.Blob.consumeFile() needs a real, existing filesystem location passed into it to work, which is why plone.app.blob[1] patches ZPublisher.HTTPRequest.FieldStorage to generate a tempfile.NamedTemporaryFile() instead of a tempfile.TemporaryFile() for large uploaded files. However, if one tries to consume a NamedTemporaryFile and then open its generated blob before closing the tempfile, Windows complains:
import ZODB.blob, tempfile f = tempfile.NamedTemporaryFile() b = ZODB.blob.Blob() b.consumeFile(f.name) b.open('r') Traceback (innermost last): ... IOError: [Errno 13] Permission denied: 'c:\\buildout\\var\\tmp\\tmpsuykkc'
Closing the NamedTemporaryFile after consuming it and before opening the blob makes the matters worse, since Windows removes the file from under Blob *after* it has been consumed, so we'll have to think of a different strategy for consuming request files on Windows, but whichever strategy ends up being used, the problem remains that we cannot keep the FileUpload instance from the request open in 'w+' mode (readable and writable) while allowing the blob instance to be opened in the same request. So I'd like to ask some policy questions: 1. Is it ok to close a FileUpload instance from the request? 2. Alternatively, is it ok to replace it on the request with a read-only file-like object with the same contents, or maybe even force it to be read-only to begin with? If none of the above are ok, then we'll be forced to copy the whole file on Windows when consuming an uploaded file from the request into a blob, unless we use some non-portable win32 code to allow writing and reading to the same file simultaneously. [1] http://dev.plone.org/plone/browser/plone.app.blob -- View this message in context: http://www.nabble.com/FileUpload-and-blob-on-Windows-tp15129190p15129190.htm... Sent from the Zope - Dev mailing list archive at Nabble.com.
Hi Leo, I'm not familiar with the Zope side of the world at all, so this may be completely useless...
However, if one tries to consume a NamedTemporaryFile and then open its generated blob before closing the tempfile, Windows complains:
import ZODB.blob, tempfile f = tempfile.NamedTemporaryFile() b = ZODB.blob.Blob() b.consumeFile(f.name) b.open('r') Traceback (innermost last): ... IOError: [Errno 13] Permission denied: 'c:\\buildout\\var\\tmp\\tmpsuykkc'
...
unless we use some non-portable win32 code to allow writing and reading to the same file simultaneously.
I'm not sure exactly what you mean by "simultaneously", but assuming you just need the ability to read and write to a file, you already can. Does this behaviour help?
import tempfile f = tempfile.NamedTemporaryFile() f.write("hello") f.seek(0,0) f.read() 'hello'
I can easily see it might not - as soon as the file is closed you have (obviously) lost it. Mark
Hi Mark, Mark Hammond-3 wrote:
[...]
unless we use some non-portable win32 code to allow writing and reading to the same file simultaneously.
I'm not sure exactly what you mean by "simultaneously", but assuming you just need the ability to read and write to a file, you already can. Does this behaviour help?
import tempfile f = tempfile.NamedTemporaryFile() f.write("hello") f.seek(0,0) f.read() 'hello'
I can easily see it might not - as soon as the file is closed you have (obviously) lost it.
I should've been clearer, I meant reading and writing at the same time from 2 different file handles. NamedTemporaryFile has the added complication of removing the file from under 'blob' when it's closed, so even if I don't try to open the blob after consuming the file, the file disappears after the request is gone, and the transaction subsequently fails when trying to rename the consumed file to it's final location. I also tried win32file.CreateHardLink() but if a file is open by one hard-link, renaming the other hard-link fails, so we're stuck with copying files wholesale on Windows, or closing the FileUpload object and letting subsequent uses of it fail. -- View this message in context: http://www.nabble.com/FileUpload-and-blob-on-Windows-tp15129190p15152299.htm... Sent from the Zope - Dev mailing list archive at Nabble.com.
Hi
Betreff: RE: [Zope-dev] FileUpload and blob on Windows
Hi Mark,
Mark Hammond-3 wrote:
[...]
unless we use some non-portable win32 code to allow writing and reading to the same file simultaneously.
I'm not sure exactly what you mean by "simultaneously", but
assuming
you just need the ability to read and write to a file, you already can. Does this behaviour help?
import tempfile f = tempfile.NamedTemporaryFile() f.write("hello") f.seek(0,0) f.read() 'hello'
Why are you using a NamedTemporaryFile? If I'm right the goal is to store the file stream from the upload directly in this file and copy this file over to the real directory location. This means you can cut down the amount of read and write file data, right? Why not use a own file class like: class TMPFile(file): """Temorary file. This temporary file can remove a file in request.close() form the file system. """ def release(self): """Release the object in the requests (_held) list.""" if self.name is not None and os.path.exists(self.name): os.unlink(self.name) You can move such a file with shutil.move(self.tmpPath, targetPath). I implemented such a file upload accelerator. With this beast I was able to upload a ubuntu vmware image with > 950 MB in about 75 seconds to Zope3. And this with a memory usage below 50 MB. It really rocks. Note; I implemented this on windows and it works well. Regards Roger Ineichen
Roger Ineichen wrote:
Why are you using a NamedTemporaryFile? If I'm right the goal is to store the file stream from the upload directly in this file and copy this file over to the real directory location. This means you can cut down the amount of read and write file data, right?
Yes, and plone.app.blob was using a NamedTemporaryFile exactly because it has a filesystem visible name that Blob.consumeFile() can os.rename() over to the final location. Roger Ineichen wrote:
Why not use a own file class like:
class TMPFile(file): """Temorary file.
This temporary file can remove a file in request.close() form the file system. """
def release(self): """Release the object in the requests (_held) list.""" if self.name is not None and os.path.exists(self.name): os.unlink(self.name)
I'm already using something like that, because NamedTemporaryFiles on Windows disappear when they're closed, *even if* they've been renamed away from the original file name, but then I hit the other snag in that open files can't be renamed, and even win32file hard links to open files can't be renamed while the original is open And I also tried replacing the request file with a blob file opened for reading, but the request outlives the transaction, so when the commit happens, the blobfile is still open and BlobStorage complains. I even tried surreptitiously opening the filesystem file from under blob and placing that in the request, but at commit time, BlobStorage tries to rename the file to it's final location, and Windows doesn't like it. Now I'm wondering if I'll have to implement a transaction manager to close a blob files before the transaction. Roger Ineichen wrote:
You can move such a file with shutil.move(self.tmpPath, targetPath).
I implemented such a file upload accelerator. With this beast I was able to upload a ubuntu vmware image with > 950 MB in about 75 seconds to Zope3. And this with a memory usage below 50 MB. It really rocks.
Note; I implemented this on windows and it works well.
I got it to work quite far in that direction, but the snag I hit is when I try to open the blob on the same transaction, and *that* fails. Interestingly, the filesystem file created by NamedTemporaryFile on Windows *can* be renamed and even removed while still open. I suppose this is due to the O_TEMPORARY flag, as mentioned in this code from NamedTemporaryFile:
# Setting O_TEMPORARY in the flags causes the OS to delete # the file when it is closed. This is only supported by Windows. if _os.name == 'nt': flags |= _os.O_TEMPORARY
If only there was a flag so that Windows allowed renaming/removing an open file, but didn't try to remove the file itself after it was closed... Cheers, Leo -- View this message in context: http://www.nabble.com/FileUpload-and-blob-on-Windows-tp15129190p15160439.htm... Sent from the Zope - Dev mailing list archive at Nabble.com.
Hi, Leonardo Rochael schrieb:
If only there was a flag so that Windows allowed renaming/removing an open file, but didn't try to remove the file itself after it was closed...
Actually looking at the restrictions that hard links have on windows, I guess that windows removes the equivalent of the `inode` on Posix instead of the directory entry that was associated with the temporary file. Sounds like bad luck and hard work. Christian -- gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle (saale) - germany www.gocept.com - ct@gocept.com - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development
hi Leo
Betreff: Re: AW: [Zope-dev] FileUpload and blob on Windows
[...]
I'm already using something like that, because NamedTemporaryFiles on Windows disappear when they're closed, *even if* they've been renamed away from the original file name, but then I hit the other snag in that open files can't be renamed, and even win32file hard links to open files can't be renamed while the original is open
And I also tried replacing the request file with a blob file opened for reading, but the request outlives the transaction, so when the commit happens, the blobfile is still open and BlobStorage complains. I even tried surreptitiously opening the filesystem file from under blob and placing that in the request, but at commit time, BlobStorage tries to rename the file to it's final location, and Windows doesn't like it.
Now I'm wondering if I'll have to implement a transaction manager to close a blob files before the transaction.
Probably the question is; when do you try to move the file? I only move the file on transaction commit after everything is done. e.g. widget validation etc. Is this not how blob will handle the move? Or does blob to early move/touch the file again. I only read from the file stream and after that I read the file size based on file system infos which is much faster. I never touch that file again till the transaction is commited. What's the reason why you need to read the file again after it is created? Regards Roger Ineichen
Hi, Leonardo Rochael schrieb:
I should've been clearer, I meant reading and writing at the same time from 2 different file handles.
NamedTemporaryFile has the added complication of removing the file from under 'blob' when it's closed, so even if I don't try to open the blob after consuming the file, the file disappears after the request is gone, and the transaction subsequently fails when trying to rename the consumed file to it's final location.
I also tried win32file.CreateHardLink() but if a file is open by one hard-link, renaming the other hard-link fails, so we're stuck with copying files wholesale on Windows, or closing the FileUpload object and letting subsequent uses of it fail.
Hmm. The Python docs already mention this problem for Windows. To avoid copying, we'd have to adjust the publisher not to use a NamedTemporaryFile, but actually use a regular temporary file that gets deleted when the publisher decides to. Christian -- gocept gmbh & co. kg - forsterstrasse 29 - 06112 halle (saale) - germany www.gocept.com - ct@gocept.com - phone +49 345 122 9889 7 - fax +49 345 122 9889 1 - zope and plone consulting and development
Leonardo Rochael wrote at 2008-1-27 22:29 -0800:
... Closing the NamedTemporaryFile after consuming it and before opening the blob makes the matters worse, since Windows removes the file from under Blob *after* it has been consumed, so we'll have to think of a different strategy for consuming request files on Windows, but whichever strategy ends up being used, the problem remains that we cannot keep the FileUpload instance from the request open in 'w+' mode (readable and writable) while allowing the blob instance to be opened in the same request. So I'd like to ask some policy questions:
1. Is it ok to close a FileUpload instance from the request?
I think this would be okay for "FileUpload"s the sole purpose of which is to be consumed by a blob. Of course, this must not happen for other "FileUpload" objects -- at least not before the request is closed.
2. Alternatively, is it ok to replace it on the request with a read-only file-like object with the same contents, or maybe even force it to be read-only to begin with?
I expect this to be safe for all "FileUpload" instances. -- Dieter
participants (5)
-
Christian Theune -
Dieter Maurer -
Leonardo Rochael -
Mark Hammond -
Roger Ineichen