Hi there In the data that we have to work with, there are names in French, Turkish, German, Greek, etc. A sample string, when printed from Python, is: 'Rabia-r\xddza Bi\xe7en \xf6grenci Yurdu.G\xf6r\xfckle' We'd like to store this data in LDAP and in Zope. Questions: - How do we find out what the current encoding of the strings are? Guess? - Say we decide it's Latin-7. How do we convert from the current string to Unicode, taking into account the fact that the source is taken to be Latin-7? - Do we need to move to Zope 2.6 in order to cope with such strings? -- Jean Jordaan Upfront Systems http://www.upfrontsystems.co.za
--On Mittwoch, 2. Oktober 2002 10:06 +0200 Jean Jordaan <jean@upfrontsystems.co.za> wrote:
Hi there
In the data that we have to work with, there are names in French, Turkish, German, Greek, etc. A sample string, when printed from Python, is: 'Rabia-r\xddza Bi\xe7en \xf6grenci Yurdu.G\xf6r\xfckle' We'd like to store this data in LDAP and in Zope.
Questions:
- How do we find out what the current encoding of the strings are?
Most European languages can be encoded using iso-8859-1, greek is iso-8859-7.
- Say we decide it's Latin-7. How do we convert from the current string to Unicode, taking into account the fact that the source is taken to be Latin-7?
Read the Python documentation, e.g. unicode_str = unicode('....', 'iso-8859-8')
- Do we need to move to Zope 2.6 in order to cope with such strings?
Maybe...depends on what the usage of unicode inside Zope. -aj
On Wednesday 02 Oct 2002 9:06 am, Jean Jordaan wrote:
Hi there
In the data that we have to work with, there are names in French, Turkish, German, Greek, etc. A sample string, when printed from Python, is: 'Rabia-r\xddza Bi\xe7en \xf6grenci Yurdu.G\xf6r\xfckle' We'd like to store this data in LDAP and in Zope.
Questions:
- How do we find out what the current encoding of the strings are? Guess?
guessing is your only option if you cant ask the person who supplied you with your data.
- Say we decide it's Latin-7. How do we convert from the current string to Unicode, taking into account the fact that the source is taken to be Latin-7?
unicode_string = unicode(encoded_8bit_string, 'data character encoding')
- Do we need to move to Zope 2.6 in order to cope with such strings?
It depends what you want to do with them. You need 2.6 if you want to use them in property pages, in dtml, or allow them to be edited in forms. (you could get patches for Zope 2.4 from http://www.zope.org/Members/htrd/wstring. They dont apply cleanly to 2.5, but are known to work after a little manual merging. Overall I think the 2.6 upgrade will be less pain) If you want to continue using an unpatched 2.5.x then you will need to manually call the unicode string's encode method every time you use it: unicode_string.encode('page character encoding')
Hi Toby
guessing is your only option if you cant ask the person who supplied you with your data.
Yup, thought as much.
unicode_string = unicode(encoded_8bit_string, 'data character encoding')
Yes. The bit we were missing in our app was this: our LDAP connection didn't want unicode_string (which looks like, say, u'string\xNN'), but it's fine with unicode_string.encode('utf-8') (which looks like, say, 'string\xc3\xNN').
Overall I think the 2.6 upgrade will be less pain)
We probably will do that. -- Jean Jordaan Upfront Systems http://www.upfrontsystems.co.za
Yes. The bit we were missing in our app was this: our LDAP connection didn't want unicode_string (which looks like, say, u'string\xNN'), but it's fine with unicode_string.encode('utf-8') (which looks like, say, 'string\xc3\xNN').
at least as far as OpenLDAP is concerned, it expects all text sent to it to be either ASCII or UTF8-encoded unicode. anything else will give you trouble or wrong query results. jens
In article <200210020932.49561.tdickenson@geminidataloggers.com> you write:
(you could get patches for Zope 2.4 from http://www.zope.org/Members/htrd/wstring. They dont apply cleanly to 2.5, but are known to work after a little manual merging. Overall I think the 2.6 upgrade will be less pain)
I have ported your patches so that they apply cleanly to 2.5.1, they're at http://www.zope.org/Members/efge/i18n/Unicode-2.5.1 It could be helpful to link this from your page. Regards, Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
participants (5)
-
Andreas Jung -
Florent Guillaume -
Jean Jordaan -
Jens Vagelpohl -
Toby Dickenson