[Zope-dev] TALParser barfing on byte-order marked utf8 XML files.
Romain Slootmaekers
romain at zzict.nl
Fri Jul 9 09:21:16 EDT 2004
Yo,
We are using TAL for things other than ZPT. but are having problems with
files that include a BOM preamble.
the problem is that althought the underlying XML parser is capable of
parsing these kind of files, TALParser initialises his parent without
encoding (XMLParser.__init__(self) in TALParser.py line 27)
Anyway,
I have attached a small example (test.py + test.ml) that illustrates
the problem with Zope 2.7.1.
running the test gives:
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)
which is perfectly logical: feff (the start of the bom preamble) is not
ascii.
chipping away the preamble (data=data[4:] ) gives problems further on in
the file as the test example has some german characters (ä)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 50: ordinal not in range(128) which is also perfectly logical:
ä has code 132.
My question is simply: why is TALParser not taking the encoding into
acount ? Is this deliberate, or is it an oversight ?
Romain Slootmaekers.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.xml
Type: text/xml
Size: 93 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope-dev/attachments/20040709/cd31fbf0/test.xml
-------------- next part --------------
#
#
#
from xml.dom.minidom import parseString
import sys
from TAL.TALParser import TALParser
from TAL.TALInterpreter import TALInterpreter
from TAL.DummyEngine import DummyEngine
import StringIO
import codecs
print sys.getdefaultencoding()
def readData():
f = open('test.xml','r')
readerClass = codecs.getreader('utf8')
print readerClass
reader = readerClass(f)
data = reader.read()
f.close()
print "size = %s" % len(data)
return data
def expand(xml):
parser = TALParser()
xml = xml[4:]
parser.parseString(xml)
program, macros = parser.getCode()
engine = DummyEngine(0)
out = StringIO.StringIO()
interpreter = TALInterpreter(program,macros,engine,stream=out)
interpreter()
result = out.getvalue()
return result
data = readData()
expanded = expand(data)
document = parseString(expanded)
print "ok"
More information about the Zope-Dev
mailing list