Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote:
--On Sonntag, 24. April 2005 16:03 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
Sunday, April 24, 2005, 2:36:24 PM, Andreas Jung wrote:
--On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
I have a Zope instance that uses utf-8 for everything. Since Python/Zope/etc practically doesn't support utf-8,
Please explain in which sense Zope would not support utf-8. For your information:
It can't sort strings alphabetically *anywhere* (concretely: the accented letters will go to the end of the list -- I guess because 0x80 is mathematically greater than the code of the US-ASCII characters).
This is neither a problem of Zope nor of Python! A Python string has no notion an an encoding. The sort method can not smell the encoding.
First of all, in this thread I don't care whose mistake it is. My concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or not. Assume that I'm using a few non-US-ASCII characters, and I want sometimes show things alphabetically sorted... Then, of course if something wants to collate string for human reading, it will use locale.strcoll, which do consider charset and locale. That locale.strcoll is wrong with UTF-8, that's certainly the mistake of Python, right?
Instead use Python unicode strings and depend on the sorting order defined by the Unicode standard.
I take that advice, but unfortunately it's not about my Python code, but about other people's Python code.
This is an application-level problem but not a server-side problem.
Zope itself gives a method for sorting strings: DocumentTemplate.sequence.sort. Many of the products relies on that for sorting. And that sorts UTF-8 incorrectly (I guess because locale.strcoll does it incorrectly). Also, ZCatalog sorts incorrectly (surely for the same reason), which is also the part of the standard Zope distribution.
Plone has UTF8 as default charset.
Believe me, I really hope I'm wrong. So how could I achieve that strings are sorted correctly? If it works for someone, how? (I have locale hu_HU.UTF-8 in zope.conf, I have even printed locale.getlocale(locale.LC_COLLATE) from products and external methods, and it was hu_HU.UTF-8. Note that at least on Python level sorting with hu_HU.ISO-8859-2 works... so I hope it would work with Plone as well.)
see above..Also the standard sort() methods of Python does not care about your locales (why should it)....strings are streams of bytes...nothing else...
I know, and I have referred to locale.strcoll, which does care about encoding and locale. Seems many products use that (indirectly) when they want to sort something.
sort() accepts a user-defined comparison method of implement user-specific sorting.
Yes, but this doesn't help, unless I write an UTF-8 comparison method, and then find all sort() and locale.sort() calls in Zope, Plone, and in other products, and patch them all...
And there are also methods in Python "locale" module to perform locale-dependent comparison.
Which I can't get working with UTF-8, it puts non-US-ASCII letters at the end of the list. Somebody did? How? I'm all ears. I guess the Plone site should suddenly sort correctly then, at least on the places where the programmer of the Zope product was wise enough not to use raw sort().
Once again: you must solve your problem on the application layer...
(Anyway string collation is not an application level problem in principle. It is the same for a book store application and for a first person shooter, there is nothing application specific in it. If Python is not mature enough to take this task, that's a different question.)
Zope does not help you at this point because it can't.
So however I formulate it, the end is that you *practically* can't use UTF-8 with Zope, unless you are using a language that doesn't use non-US-ASCII characters, in which case you don't utilize UTF-8. Hence, I said it is "not supported"... It doesn't mean that it is the mistake of Zope, it just means that you can't use it. So, back to the topic... Since UTF-8 is not working (it seems), how could I convert that already filled instance to use ISO-8859-2 instead of UTF-8? Some tool helps me in it done relatively easy?
-aj
-- Best regards, Daniel Dekany