[Zope3-dev] Re: zope.tal.xmlparser.XMLParser() dislikes unicode
Martijn Faassen
faassen at startifact.com
Mon Jan 15 08:52:42 EST 2007
Hey,
Gmane isn't updating so I can't really reply to the message (not visible
in gmane) that I want to, but I saw the following solution proposed:
def ourparse(text):
if isinstance(text, unicode):
text = text.encode('UTF-8')
xml_parser.parse(text)
now consider what will happen if you do the following:
text = u"<?xml version="1.0" encoding="ISO-8859-1" ?><foo>Some non-ascii
characters here</foo>"
ourparse(text)
what will happen is that text is converted to a UTF-8 string (8-bit
ascii). It's then passed to a hopefully compliant XML parser. This XML
parser sees an 8-bit ascii string, and checks the encoding header for
more information on the encoding of the string. It will therefore assume
the string is in latin-1. The parse will break with an obscure error and
the developer doing this is probably very confused.
This is why it's better to refuse to guess.
Regards,
Martijn
More information about the Zope3-dev
mailing list