Hi all, Due to project demands I had to convert the html entities in all TEXT fields in my database to the real international characters like "á", "ç" and so on. The problem is that all my dynamic generated XML started to fail with UnicodeDecodeError. The XML encoding is set to utf-8 and the file itself is correctly encoded. If I inject the international characters using a Python Script with something like u'çáé' it works, but passing the TEXT field from the database generates the error. I'm using Zope 2.9.2 and MySQL. The XML is generated using ZPT. The XML, invokes a script to filter and add links to the text retrieved from the database, ex: <?xml version='1.0' encoding='utf-8'?> ... xml stuff ... <txt tal:content="structure python:container.scripts.montarCDATA (texto=obra.Conteudo, links=obra.links)"></txt> <rdp tal:content="structure python:container.scripts.montarCDATA (texto=obra.Rodape)"></rdp> ... xml stuff ... obra is the database row and Conteudo is the TEXT field. montarCDATA is as simple as (I removed the code to build the links): return '<![CDATA[%s]]>' % texto if instead of the TEXT field I put something like: return '<![CDATA[%s]]>' % u'çáéà' it works, which leads me to believe the problem is with the database field and XML. The strange thing is that with common html everything works as expected. Traceback (innermost last): - Module ZPublisher.Publish, line 115, in publish - Module ZPublisher.mapply, line 88, in mapply - Module ZPublisher.Publish, line 41, in call_object - Module Shared.DC.Scripts.Bindings, line 311, in __call__ - Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec - Module Products.PageTemplates.ZopePageTemplate, line 256, in _exec - Module Products.PageTemplates.PageTemplate, line 105, in pt_render *<ZopePageTemplate at /path/to/file/ano.xml>* - Module StringIO, line 271, in getvalue UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 50: ordinal not in range(128) Thanks in advance, -- Luiz Fernando B. Ribeiro
Luiz Fernando Bernardes Ribeiro wrote at 2006-8-14 11:03 -0300:
Due to project demands I had to convert the html entities in all TEXT fields in my database to the real international characters like "á", "ç" and so on.
Good!
The problem is that all my dynamic generated XML started to fail with UnicodeDecodeError. The XML encoding is set to utf-8 and the file itself is correctly encoded.
If I inject the international characters using a Python Script with something like u'çáé' it works, but passing the TEXT field from the database generates the error.
Then, this means that the value from the database is not unicode. Convert it to unicode by "value.decode(the_database_encoding)". -- Dieter
participants (2)
-
Dieter Maurer -
Luiz Fernando Bernardes Ribeiro