Big improvement for load_site.py, patch included
Hi, I'm pleased to send the attached patch for load_site.py (tested against load_site.py from Zope 2.3.2), just tell me if it's useful. The original load_site.py determined the content-type of the file to upload to the ZODB from the file's extension, and a wrong content type (text/plain, the default one) was set for unrecognized extensions. For example PDF files were uploaded as text/plain. The modified version uses urllib to get the real content-type. urllib is able to load local content, so why not use it ? This allowed me to delete the methods to handle images, which are now handled automagically, as well as upload a bunch of documents to my ZODB while preserving their original content-type. Another thing at which urllib would be useful is at allowing load_site to load sites from the web instead of from the local filesystem, however more work is needed for this to work with directories. Some sanitization is done to document's ids before uploading them to the ZODB, because I encountered problems trying to upload files which names contained spaces or accented characters: each invalid character is replaced with an underscore. However this may produce invalid ids too, so perhaps a better solution is needed. <dtml-rant mode="LOUD"> load_site is wonderful, however I really think it could be even better if my idea of putting the ZODB object's meta-type in a single comment line at the top of DTML Documents and DTML Methods was finally implemented (see zope@zope.org archives). This comment, why not a # comment instead of a <!-- --> or <dtml-comment ...>, would be set on EXPORT/ViewSource only, and stripped by Zope on IMPORT/CREATION This would allow people to easily work with both DTML Documents and DTML Methods in external tools and have them uploaded with the correct metatype. Given that PythonScripts already contain such a stupid comment which means nothing to external tools but only to Zope, I really can't see why DTML Documents and Methods wouldn't have the same thing ! Of course on import/creation if the said comment is absent, then the actual behavior would be preserved, i.e. the 50% chances to be wrong would still be there for those who don't want to understand how much this would simplify some people's work. I'd be glad with something like: # DTML Method <dtml-something ...> ... or: <!-- DTML Document --> <dtml-something ...> ... I'd be just as glad to counter any objection, but please send them privately and I'll summarize. </dtml-rant> hope-this-helps-and-won't-be-lost-into-the-Collector'sly yours. Jerome Alet - alet@unice.fr --- load_site.py.orig Tue Jul 24 14:21:05 2001 +++ load_site.py Tue Jul 24 14:22:34 2001 @@ -119,7 +119,7 @@ Use *old* zope method names. """ -import sys, getopt, os, string +import sys, getopt, os, string, urllib ServerError='' verbose=0 old=0 @@ -173,6 +173,17 @@ for f in files: upload_file(object, f) +def sanitize(id) : + # sanitize the id in case it contains special characters + # more clean sanitization should be done, of course... + valid = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_." + for i in range(len(id)) : + # if character is invalid + if id[i] not in valid : + # replace with an underscore + id= id[:i] + '_' + id[i+1:] + return id + def call(f, *args, **kw): # Call a function ignoring redirect bci errors. try: apply(f,args, kw) @@ -181,6 +192,7 @@ raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2] def upload_file(object, f): + f = os.path.expanduser(f) # takes care of ~username/... and ~/... if os.path.isdir(f): return upload_dir(object, f) dir, name = os.path.split(f) root, ext = os.path.splitext(name) @@ -193,11 +205,37 @@ return globals()['upload_'+ext](object, f) if verbose: print 'upload_file', f, ext - call(object.manage_addFile, id=name, file=open(f,'rb')) + + # we now use urllib to get the real content-type. + name = sanitize(name) + try : + # WARNING: both urlopen and object.read may + # raise an IOError, in the latter case that's when + # object is a filesystem directory, but this one shouldn't occur + # because directories are uploaded by upload_dir() + fsfile = urllib.urlopen(f) + info = fsfile.info() + ctype = info.gettype() + mtype = info.getmaintype() + realurl = fsfile.geturl() + if mtype == "image" : + # Image + call(object.manage_addImage, id = name, file = fsfile, title = realurl, precondition = '', content_type = ctype) + elif ctype == 'text/html' : + # DTML Document, just in case it was not seen by upload_html/upload_htm + call(object.manage_addDTMLDocument, id = name, title = realurl, file = fsfile) + else : + # normal File + call(object.manage_addFile, id = name, file = fsfile, title = realurl, precondition = '', content_type = ctype) + fsfile.close() + del fsfile + except IOError,msg : + sys.stderr.write('Error %s, occured while retrieving %s\n' % (msg, f)) def upload_dir(object, f): if verbose: print 'upload_dir', f dir, name = os.path.split(f) + name = sanitize(name) call(object.manage_addFolder, id=name) object=object.__class__(object.url+'/'+name, username=object.username, @@ -309,6 +347,7 @@ def upload_html(object, f): dir, name = os.path.split(f) + name = sanitize(name) f=open(f) if doctor: @@ -355,6 +394,7 @@ def upload_dtml(object, f): dir, name = os.path.split(f) + name = sanitize(name) f=open(f) if old: @@ -363,13 +403,6 @@ else: call(object.manage_addDTMLMethod, id=name, file=f) - -def upload_gif(object, f): - dir, name = os.path.split(f) - call(object.manage_addImage, id=name, file=open(f,'rb')) - -upload_jpg=upload_gif -upload_png=upload_gif if __name__=='__main__': main()
Some sanitization is done to document's ids before uploading them to the ZODB, because I encountered problems trying to upload files which names contained spaces or accented characters: each invalid character is replaced with an underscore. However this may produce invalid ids too, so perhaps a better solution is needed.
"space" isn't an illegal character. What will happen to those cases where M$ is involved and you download a site for offline view and then decide to upload it into Zope. You might save the webpage like "peter bengtsson.html" and with it comes a folder full of images and stylesheets and they are all called using a space. Cheers, Peter
On Tue, 24 Jul 2001, Peter Bengtsson wrote:
Some sanitization is done to document's ids before uploading them to the ZODB, because I encountered problems trying to upload files which names contained spaces or accented characters: each invalid character is replaced with an underscore. However this may produce invalid ids too, so perhaps a better solution is needed.
"space" isn't an illegal character.
What will happen to those cases where M$ is involved and you download a site for offline view and then decide to upload it into Zope. You might save the webpage like "peter bengtsson.html" and with it comes a folder full of images and stylesheets and they are all called using a space.
It's strange because load_site.py received a 400 HTTP error whenever I tried to upload a .htm file which name contained spaces, and when replacing the spaces with underscores all went fine. could there be another reason I missed ? bye, Jerome Alet
I had worked on load_site, it's up at my members page http://www.zope.org/Members/bowerymarc I fixed a bug dealing with spaced names, but kept them as spaces since zope has no problem with that. Added other stuff to ease iterative reloading of a site too.
From: "Peter Bengtsson" <mail@peterbe.com> Date: Tue, 24 Jul 2001 15:38:24 +0200 To: "Jerome Alet" <alet@unice.fr>, <zope@zope.org> Subject: Re: [Zope] Big improvement for load_site.py, patch included
Some sanitization is done to document's ids before uploading them to the ZODB, because I encountered problems trying to upload files which names contained spaces or accented characters: each invalid character is replaced with an underscore. However this may produce invalid ids too, so perhaps a better solution is needed.
"space" isn't an illegal character.
What will happen to those cases where M$ is involved and you download a site for offline view and then decide to upload it into Zope. You might save the webpage like "peter bengtsson.html" and with it comes a folder full of images and stylesheets and they are all called using a space.
Cheers, Peter
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Dieter Maurer wrote:
Peter Bengtsson writes:
"space" isn't an illegal character. Space is illegal in URLs (as of the URL spec) but it is allowed by Zope and most browsers and servers do not complain.
It isn't allowed by Zope, you get 400 Bad Request barfage, which is irritating, 'cos although it's against the spec, everyoneelse doesn't seem to mind... cheers, Chris
participants (5)
-
Chris Withers -
Dieter Maurer -
Jerome Alet -
marc lindahl -
Peter Bengtsson