RE: [Zope] Unicode ZPT and XML woes
Doh. FWIW, I bypassed the problem by removing 'goofy' characters (accented 'e' in this case) from the item/head field, but that doesn't guarantee the problem won't come back in the future. Is there any way I can enforce a certain charset in my forms, so the non-supported characters don't enter the system in the first place? And is there a way to see what character set a given string is in? -jim -----Original Message----- From: J Cameron Cooper [mailto:jccooper@jcameroncooper.com] Sent: Thursday, November 13, 2003 3:59 PM To: Jim Kutter Cc: zope@zope.org Subject: Re: [Zope] Unicode ZPT and XML woes Jim Kutter wrote:
So I'm trying to write an XML RSS feed using Page Templates.
I'm having problems where I pull metadata out of a catalog, and try to use it as the item title for the feed
-- ... <item tal:repeat="item python: here.search.news_catalog({'post_date': [here.ZopeTime()-1, here.ZopeTime()], 'post_date_usage': 'range:min:max', 'sort_on':'post_date', 'sort_order':'reverse'})"> <title tal:content="item/head"> Title Here </title> ... </item> --
I get Error Type: UnicodeError Error Value: ASCII decoding error: ordinal not in range(128)
It seems that I'm getting trapped in unicode hell, as that item/head is in ascii, and the ZPT is in unicode?
I've tried every combination of encoding the head string (tuf8, utf16, ascii) with either replace or ignore, with no luck.
Aside from re-implenting the stream as a DTML or Python script what can I do?
I've already also tried the "LOCALIZER_USE_ZOPE_UNICODE=yes" suggestion from the list archives.
Hey, I've got that bug on my desk right now (along with a number of other unicode problems in the Zope core.) I don't know what to do about it yet (I can't get to it until at least tomorrow), but I have a feeling that my solution is going to be a patch against the core. --jcc -- "Code generators follow the 80/20 rule. They solve most of the problems, but not all of the problems. There are always features and edge cases that will need hand-coding. Even if code generation could build 100 percent of the application, there will still be an endless supply of boring meetings about feature design." (http://www.devx.com/java/editorial/15511)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/11/2003, at 8:27 AM, Jim Kutter wrote:
Doh.
FWIW, I bypassed the problem by removing 'goofy' characters (accented 'e' in this case) from the item/head field, but that doesn't guarantee the problem won't come back in the future.
Is there any way I can enforce a certain charset in my forms, so the non-supported characters don't enter the system in the first place? And is there a way to see what character set a given string is in?
Simple solution is to set the Content-Type header to your preferred character set, and then get Zope to convert the 8bit encoded results from your form to Unicode using its casting facility: <tal:x condition="python:request.RESPONSE.setHeader('Content-Type','utf8')" /> <form action="foo" accept-charset="utf8"> <input name="bar:ustring:utf8" value="Zope converts this to a Unicode string" /> <input type="submit" /> </form> Alternatively, if you want to avoid the possibility of people messing with your forms or submitting data in non standard ways (typing a query string into a URL for example), you need to decode the strings in your form handler: def handleFoo(self, bar, REQUEST=None): ''' docstring ''' if type(bar) != type(u''): bar = bar.decode('utf8') # Needs to match the content type of your form [ ... ] Some insight into this issue can be found at http://lists.oasis-open.org/archives/wsrp-markup/200208/msg00012.html I recommend sticking with UTF8 as your encoding in case somebody enters Japanese into your forms. - -- Stuart Bishop <stuart@stuartbishop.net> ☞ http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/tE60AfqZj7rGN0oRAuDXAJ9hEW+XVTvZkd2QmeiHP968hzsm/ACgjr4E N2xBPqnXYGtvcs1iYgNvQh0= =9B+5 -----END PGP SIGNATURE-----
participants (2)
-
Jim Kutter -
Stuart Bishop