Using tal:attributes in XML with non-ASCII characters
I'm just trying to think of a way of using tal:attributes with non- ASCII characters using the XML parser. At first, I just used a straight tal:attributes with the encoding of the template set to UTF-8 but that threw an expat error. My second attempt was to write a function which replaced all the non- ASCII characters with entities such as Ā Unfortunately his also doesn't work because tal:attributes escapes it into Ā which then fails. Finally, I tried tal:attributes with my function and the structure keyword but 'structure' isn't supported with tal:attributes. I've tried this on a variety of Zope's from 2.7.5 to 2.8.6 Any ideas would be appreciated. A -- Logicalware Ltd Stuart House, Eskmills, Musselburgh, EH21 7PQ, UK Tel: +44(0)131 273 5130 http://www.logicalware.com
Andrew Veitch wrote at 2006-3-18 15:56 +0000:
I'm just trying to think of a way of using tal:attributes with non- ASCII characters using the XML parser.
At first, I just used a straight tal:attributes with the encoding of the template set to UTF-8 but that threw an expat error.
This means that almost surely your "non-ascii" was not encoded in UTF-8. Encode them this way and it will work.
My second attempt was to write a function which replaced all the non- ASCII characters with entities such as Ā
Unfortunately his also doesn't work because tal:attributes escapes it into Ā which then fails.
I have extended our local Zope version to support (beside "text" and "structure") also "mtext" (for markup text). It does not escape entity references but still escapes other markup.
Finally, I tried tal:attributes with my function and the structure keyword but 'structure' isn't supported with tal:attributes.
I have extended our local Zope to support "structure" for attributes as well. I could provide patches, if useful.
... Any ideas would be appreciated.
The correct (recommended) way it to encode your non-ascii in the encoding you claim to be using. Then the "expat" error will go away. -- Dieter
On 19 Mar 2006, at 19:11, Dieter Maurer wrote:
This means that almost surely your "non-ascii" was not encoded in UTF-8. Encode them this way and it will work.
Here's a test template that I created through the ZMI: <?xml version="1.0" encoding="utf-8"?> <html xmlns:tal="http://xml.zope.org/namespaces/tal" tal:define="dummy python:request.RESPONSE.setHeader('content- type', 'text/html;;charset=UTF-8')"> <body> <form> <input name="blah" type="text" tal:attributes="value python:chr (200).encode('utf-8')" /> </form> </body> </html> This gives: Error Type: UnicodeDecodeError Error Value: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) If I change the input line to: <input name="blah" type="text" tal:attributes="value python:chr(200)" /> Then this will work in HTML mode but will fail in XML mode.
Finally, I tried tal:attributes with my function and the structure keyword but 'structure' isn't supported with tal:attributes.
I have extended our local Zope to support "structure" for attributes as well.
I had a look in TAL, and at the bottom of TALDefs.py is a function called attrEscape(s) which correctly escapes attributes but unfortunately this function doesn't seem to be used.
I could provide patches, if useful.
I would be very interested to see you patches. Thanks in advance Andrew -- Logicalware Ltd Stuart House, Eskmills, Musselburgh, EH21 7PQ, UK Tel: +44(0)131 273 5130 http://www.logicalware.com
Andrew Veitch wrote at 2006-3-20 01:53 +0000:
... <input name="blah" type="text" tal:attributes="value python:chr (200).encode('utf-8')" /> This gives:
Error Type: UnicodeDecodeError Error Value: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
Sure, you are using "str.encode" in a wrong way: "str.encode('uft-8')" is equivalent to "unicode(str, getdefaultencoding()).encode('utf-8')". What encoding should your "200" use? Convert it to unicode using this encoding (and let the ZPublisher convert the unicode to "utf-8"). By the way, your exception must come from somewhere else as "chr(200)" cannot lead to a "byte 0x80". It is always worth to look at the traceback. It tells you where the exception really comes from...
... <input name="blah" type="text" tal:attributes="value python:chr(200)" />
Then this will work in HTML mode but will fail in XML mode.
You should use Unicode in XML mode...
...
I could provide patches, if useful.
I would be very interested to see you patches.
Attached. -- Dieter
Andrew Veitch wrote:
Error Type: UnicodeDecodeError Error Value: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
If I change the input line to:
<input name="blah" type="text" tal:attributes="value python:chr(200)" />
Then this will work in HTML mode but will fail in XML mode.
This all sounds familiar. I remember having loads of fun with MailTemplates, which work predominantly in XML mode. The rules I have from working with them are as follows: If content_type is set to text/html, then: - any unicodes should be encoded using the character set you intend to use for the final mail encoding - you are responsible for ensuring that any strings are encoded with the correct character set, which should be that used for the final mail encoding. If content_type is set to anything else: - all string-like data inserted into the Mail Template during rendering must be in the form of unicode objects. So, in your case, make sure everything you insert with any tal is a unicode object. hth, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
participants (3)
-
Andrew Veitch -
Chris Withers -
Dieter Maurer