Thursday, April 14, 2005, 12:35:37 PM, Andreas Jung wrote:
--On Donnerstag, 14. April 2005 12:20 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
I have a Zope 2.7.0(+Plone) instance that uses utf-8 encoding everywhere. The problem is that alphabetical sorting (like with DocumentTemplate.sequence.sort(seq, 'locale', ...)) is broken everywhere: accented letters come after all US-ASCII characters. I have locale=hu_HU.UTF-8 in zope.conf, still it seems that the collation algorithm can't handle UTF-8 encoded strings correctly, and since 0x80 is higher than the code of the US-ASCII characters, a character that is out of the US-ASCII range will be later than the US-ASCII ones. Actually Python can't sort UTF-8 with strcoll either (at least I couldn't achieve that), I guess the root of the problem is there.
Right. This is not a Zope problem so better ask the Python world or file a Python bug report.
I see, but then my question is: How do people use Zope for sites where "Unicode" is needed? They just don't use Zope in such cases? At my new employer here is fat Plone site running for months with the mentioned sorting disorders. I don't know why my predecessor has made it with UTF-8 if it is not supported. And if it is really not supported, then I hope there is some utility by which I can convert the charset of a whole Zope database... is there?
So, what should I do now? UTF-8 charset doesn't work in reality with Zope so I should forget it and switch to ISO-8859-x?
sequence.sort() accepts also custom comparison methods. So you could write your own method *somehow*.
That would be OK for me if that works. The problem is that sorting mostly happens in 3rd party products, and they will call sequence.sort with 'locale' and 'locale_nocase' and such, and not with my custom comparison function. OK, I could then patch the sequence.sort of Zope, so it is UTF-8 aware even with 'locale' and with 'locale_nocase'. But still not good, because there will be places where Python's locale.strcoll is used, and worst maybe both sequence.sort and locale.strcoll is used regarding the same sequence on different places, and then there will be inconsistencies. So after all I should patch Python, which is really out of my competence. But I don't know, I'm totally new to Python and Zope (I'm primarily a Java guy)... so do I miss something?
-aj
-- Best regards, Daniel Dekany