How to display unicode character given its code
Hi, This is a bit OT but this list is the best place I know to ask it :) I have a form which uses UTF-8, the text entered in the form then has to be converted to iso-2022-jp. If any character is entered that isn't valid for iso-2022-jp I get: UnicodeError: ISO-2022-JP encoding error: invalid character \u2013 What I need to do is display the offending character in the error message. So the question is, how do I take the '2013' and display the appropriate character? TIA, Itai
Itai Tavor wrote at 2005-7-25 20:21 +1000:
This is a bit OT but this list is the best place I know to ask it :)
I have a form which uses UTF-8, the text entered in the form then has to be converted to iso-2022-jp. If any character is entered that isn't valid for iso-2022-jp I get:
UnicodeError: ISO-2022-JP encoding error: invalid character \u2013
What I need to do is display the offending character in the error message.
I what encoding? Obviously, you cannot use "ISO-2022-JP". If you can handle "UTF-8", then use "unicode('\u2013').encode('utf-8')". You can also use the "xmlcharref" error parameter for your "encode(iso-2022-hp)". In this case, your character would become an XML character reference. They have the form "–". Browsers usually are able to display them. -- Dieter
On 26/07/2005, at Tue 26/07 4:12AM, Dieter Maurer wrote:
Itai Tavor wrote at 2005-7-25 20:21 +1000:
This is a bit OT but this list is the best place I know to ask it :)
I have a form which uses UTF-8, the text entered in the form then has to be converted to iso-2022-jp. If any character is entered that isn't valid for iso-2022-jp I get:
UnicodeError: ISO-2022-JP encoding error: invalid character \u2013
What I need to do is display the offending character in the error message.
I what encoding?
Obviously, you cannot use "ISO-2022-JP". If you can handle "UTF-8", then use "unicode('\u2013').encode ('utf-8')".
You can also use the "xmlcharref" error parameter for your "encode(iso-2022-hp)". In this case, your character would become an XML character reference. They have the form "–". Browsers usually are able to display them.
Thanks, Dieter. Couldn't get it to work, though... not sure why: unicode('\u2013') returns u'\\u2013', which is useless. Just found the unichr method though. unichr(int('2013', 16)) does the job. Itai
Itai Tavor wrote at 2005-7-26 11:20 +1000:
... Couldn't get it to work, though... not sure why: unicode('\u2013') returns u'\\u2013', which is useless.
I know why (now, that you report the problem). '...' introduces a non unicode string in which "\u" is not recognized. "\u" is only recognized in unicode strings. They are introduced by "u'...'". Thus, instead of "unicode('\u2013').encode('utf-8')", I should have written "u'\u2013'.encode('utf-8')". -- Dieter
participants (2)
-
Dieter Maurer -
Itai Tavor