Hi all,

Due to project demands I had to convert the html entities in all TEXT fields in my database to the real international characters like "ב", "ח" and so on.

The problem is that all my dynamic generated XML started to fail with UnicodeDecodeError. The XML encoding is set to utf-8 and the file itself is correctly encoded.

If I inject the international characters using a Python Script with something like u'חבי' it works, but passing the TEXT field from the database generates the error.

I'm using Zope 2.9.2 and MySQL. The XML is generated using ZPT.

The XML, invokes a script to filter and add links to the text retrieved from the database, ex:

<?xml version='1.0' encoding='utf-8'?>
... xml stuff ...
            <txt tal:content="structure python: container.scripts.montarCDATA(texto=obra.Conteudo, links=obra.links)"></txt>
            <rdp tal:content="structure python:container.scripts.montarCDATA(texto=obra.Rodape)"></rdp>
... xml stuff ...

obra is the database row and Conteudo is the TEXT field.

montarCDATA is as simple as (I removed the code to build the links):
return '<![CDATA[%s]]>' % texto

if instead of the TEXT field I put something like:
return '<![CDATA[%s]]>' % u'חביא'

it works, which leads me to believe the problem is with the database field and XML. The strange thing is that with common html everything works as expected.

Traceback (innermost last):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 50: ordinal not in range(128)

Thanks in advance,

--
Luiz Fernando B. Ribeiro