[Zope] converting bytestreams from iso-8859-1 to utf-8
Giuseppe Bonelli
giuseppe.bonelli at tiscali.it
Sat May 6 14:07:16 EDT 2006
Hi all,
sorry if this is not zope specific, but can someone please explain
to me the following behaviour when trying to convert an iso-8859-1 string
read from a file to an utf-8 encoded one?
s='\x93test\x94' #an iso-8859-1 string
#\x93 and \x94 are left and right
#double quotation marks,
#as seen in a browser set to iso-8859-1
ss=unicode(s,'iso-8859-1').encode('utf-8')
gives
ss='\xc2\x93test\xc2\x94'
which is wrong (as seen in a browser set to utf-8)!
but:
u=unicode(s,'iso-8859-1')
u=u.replace(u'\x93',u'\u201C') #u201C is unicode left double quot mark
u=u.replace(u'\x94',u'\u201D') #u201d is unicode right double quot mark
ss=u.encode('utf-8')
gives
ss='\xe2\x80\x9ctest\xe2\x80\x9d'
which is right (as seen in a browser set to utf-8)!
Do I have to explicitly replace all characters above \x7F ?
TIA
__peppo
More information about the Zope
mailing list