Sorting is broken with UTF-8?
I have a Zope 2.7.0(+Plone) instance that uses utf-8 encoding everywhere. The problem is that alphabetical sorting (like with DocumentTemplate.sequence.sort(seq, 'locale', ...)) is broken everywhere: accented letters come after all US-ASCII characters. I have locale=hu_HU.UTF-8 in zope.conf, still it seems that the collation algorithm can't handle UTF-8 encoded strings correctly, and since 0x80 is higher than the code of the US-ASCII characters, a character that is out of the US-ASCII range will be later than the US-ASCII ones. Actually Python can't sort UTF-8 with strcoll either (at least I couldn't achieve that), I guess the root of the problem is there. So, what should I do now? UTF-8 charset doesn't work in reality with Zope so I should forget it and switch to ISO-8859-x? -- Best regards, Daniel Dekany
--On Donnerstag, 14. April 2005 12:20 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
I have a Zope 2.7.0(+Plone) instance that uses utf-8 encoding everywhere. The problem is that alphabetical sorting (like with DocumentTemplate.sequence.sort(seq, 'locale', ...)) is broken everywhere: accented letters come after all US-ASCII characters. I have locale=hu_HU.UTF-8 in zope.conf, still it seems that the collation algorithm can't handle UTF-8 encoded strings correctly, and since 0x80 is higher than the code of the US-ASCII characters, a character that is out of the US-ASCII range will be later than the US-ASCII ones. Actually Python can't sort UTF-8 with strcoll either (at least I couldn't achieve that), I guess the root of the problem is there.
Right. This is not a Zope problem so better ask the Python world or file a Python bug report.
So, what should I do now? UTF-8 charset doesn't work in reality with Zope so I should forget it and switch to ISO-8859-x?
sequence.sort() accepts also custom comparison methods. So you could write your own method *somehow*. -aj
participants (2)
-
Andreas Jung -
Daniel Dekany