[Zope] converting bytestreams from iso-8859-1 to utf-8

Giuseppe Bonelli giuseppe.bonelli at tiscali.it
Sat May 6 14:07:16 EDT 2006


Hi all,
sorry if this is not zope specific, but can someone please explain 
to me the following behaviour when trying to convert an iso-8859-1 string 
read from a file to an utf-8 encoded one?

s='\x93test\x94' #an iso-8859-1 string
                  #\x93 and \x94 are left and right
                  #double quotation marks,
                  #as seen in a browser set to iso-8859-1
ss=unicode(s,'iso-8859-1').encode('utf-8')
gives
ss='\xc2\x93test\xc2\x94'
which is wrong (as seen in a browser set to utf-8)!

but:
u=unicode(s,'iso-8859-1')
u=u.replace(u'\x93',u'\u201C') #u201C is unicode left double quot mark
u=u.replace(u'\x94',u'\u201D') #u201d is unicode right double quot mark 
ss=u.encode('utf-8')
gives
ss='\xe2\x80\x9ctest\xe2\x80\x9d'
which is right (as seen in a browser set to utf-8)!

Do I have to explicitly replace all characters above \x7F ?

TIA
__peppo




More information about the Zope mailing list