New subject: [Zope] Big improvement for load_site.py, patch included

24 Jul 2001

Hi,

I'm pleased to send the attached patch for load_site.py (tested against
load_site.py from Zope 2.3.2), just tell me if it's useful.

The original load_site.py determined the content-type of the file to
upload to the ZODB from the file's extension, and a wrong content type
(text/plain, the default one) was set for unrecognized extensions. For
example PDF files were uploaded as text/plain.

The modified version uses urllib to get the real content-type. urllib is
able to load local content, so why not use it ?

This allowed me to delete the methods to handle images, which are now
handled automagically, as well as upload a bunch of documents to my ZODB
while preserving their original content-type. 

Another thing at which urllib would be useful is at allowing load_site to
load sites from the web instead of from the local filesystem, however more
work is needed for this to work with directories.

Some sanitization is done to document's ids before uploading them to the
ZODB, because I encountered problems trying to upload files which names
contained spaces or accented characters: each invalid character is
replaced with an underscore. However this may produce invalid ids too, so
perhaps a better solution is needed.

<dtml-rant mode="LOUD">

load_site is wonderful, however I really think it could be even better
if my idea of putting the ZODB object's meta-type in a single comment line
at the top of DTML Documents and DTML Methods was finally implemented (see
zope@zope.org archives). 

This comment, why not a # comment instead of a  or <dtml-comment
...>, would be set on EXPORT/ViewSource only, and stripped by Zope on
IMPORT/CREATION This would allow people to easily work with both DTML
Documents and DTML Methods in external tools and have them uploaded with
the correct metatype.  Given that PythonScripts already contain such a
stupid comment which means nothing to external tools but only to Zope, I
really can't see why DTML Documents and Methods wouldn't have the same
thing !  Of course on import/creation if the said comment is absent, then
the actual behavior would be preserved, i.e. the 50% chances to be wrong
would still be there for those who don't want to understand how much this
would simplify some people's work. 

I'd be glad with something like:
	
	# DTML Method
	<dtml-something ...>
	...

or:

	
	<dtml-something ...>
	...

I'd be just as glad to counter any objection, but please send them
privately and I'll summarize. 

</dtml-rant>

hope-this-helps-and-won't-be-lost-into-the-Collector'sly yours.

Jerome Alet - alet@unice.fr

--- load_site.py.orig	Tue Jul 24 14:21:05 2001
+++ load_site.py	Tue Jul 24 14:22:34 2001
@@ -119,7 +119,7 @@
          Use *old* zope method names.
 """
 
-import sys, getopt, os, string
+import sys, getopt, os, string, urllib
 ServerError=''
 verbose=0
 old=0
@@ -173,6 +173,17 @@
 
     for f in files: upload_file(object, f)
 
+def sanitize(id) :
+    # sanitize the id in case it contains special characters
+    # more clean sanitization should be done, of course...
+    valid = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_."
+    for i in range(len(id)) :
+	# if character is invalid
+	if id[i] not in valid :
+	    # replace with an underscore
+	    id= id[:i] + '_' + id[i+1:]
+    return id
+
 def call(f, *args, **kw):
     # Call a function ignoring redirect bci errors.
     try: apply(f,args, kw)
@@ -181,6 +192,7 @@
             raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2]
 
 def upload_file(object, f):
+    f = os.path.expanduser(f)	# takes care of ~username/... and ~/...
     if os.path.isdir(f): return upload_dir(object, f)
     dir, name = os.path.split(f)
     root, ext = os.path.splitext(name)
@@ -193,11 +205,37 @@
         return globals()['upload_'+ext](object, f)
 
     if verbose: print 'upload_file', f, ext
-    call(object.manage_addFile, id=name, file=open(f,'rb'))
+
+    # we now use urllib to get the real content-type.
+    name = sanitize(name)
+    try :
+	# WARNING: both urlopen and object.read may
+	# raise an IOError, in the latter case that's when
+	# object is a filesystem directory, but this one shouldn't occur
+	# because directories are uploaded by upload_dir()
+	fsfile = urllib.urlopen(f)
+	info = fsfile.info()
+	ctype = info.gettype()
+	mtype = info.getmaintype()
+	realurl = fsfile.geturl()
+	if mtype == "image" :
+	    # Image
+	    call(object.manage_addImage, id = name, file = fsfile, title = realurl, precondition = '', content_type = ctype)
+	elif ctype == 'text/html' :
+	    # DTML Document, just in case it was not seen by upload_html/upload_htm
+	    call(object.manage_addDTMLDocument, id = name, title = realurl, file = fsfile)
+	else :
+	    # normal File
+	    call(object.manage_addFile, id = name, file = fsfile, title = realurl, precondition = '', content_type = ctype)
+	fsfile.close()
+	del fsfile
+    except IOError,msg :
+	sys.stderr.write('Error %s, occured while retrieving %s\n' % (msg, f))
 
 def upload_dir(object, f):
     if verbose: print 'upload_dir', f
     dir, name = os.path.split(f)
+    name = sanitize(name)
     call(object.manage_addFolder, id=name)
     object=object.__class__(object.url+'/'+name,
                             username=object.username,
@@ -309,6 +347,7 @@
 
 def upload_html(object, f):
     dir, name = os.path.split(f)
+    name = sanitize(name)
     f=open(f)
 
     if doctor:
@@ -355,6 +394,7 @@
 
 def upload_dtml(object, f):
     dir, name = os.path.split(f)
+    name = sanitize(name)
     f=open(f)
 
     if old:
@@ -363,13 +403,6 @@
     else:
         call(object.manage_addDTMLMethod, id=name, file=f)
         
-
-def upload_gif(object, f):
-    dir, name = os.path.split(f)
-    call(object.manage_addImage, id=name, file=open(f,'rb'))
-
-upload_jpg=upload_gif
-upload_png=upload_gif
 
 if __name__=='__main__': main()

    

Big improvement for load_site.py, patch included

Jerome Alet

Peter Bengtsson

Jerome Alet

marc lindahl

Dieter Maurer

Chris Withers

tags

participants (5)