Re: [Zope] How to convert Zope instance charset?
Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote:
--On Sonntag, 24. April 2005 16:03 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
Sunday, April 24, 2005, 2:36:24 PM, Andreas Jung wrote:
--On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
I have a Zope instance that uses utf-8 for everything. Since Python/Zope/etc practically doesn't support utf-8,
Please explain in which sense Zope would not support utf-8. For your information:
It can't sort strings alphabetically *anywhere* (concretely: the accented letters will go to the end of the list -- I guess because 0x80 is mathematically greater than the code of the US-ASCII characters).
This is neither a problem of Zope nor of Python! A Python string has no notion an an encoding. The sort method can not smell the encoding.
First of all, in this thread I don't care whose mistake it is. My concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or not. Assume that I'm using a few non-US-ASCII characters, and I want sometimes show things alphabetically sorted... Then, of course if something wants to collate string for human reading, it will use locale.strcoll, which do consider charset and locale. That locale.strcoll is wrong with UTF-8, that's certainly the mistake of Python, right?
Instead use Python unicode strings and depend on the sorting order defined by the Unicode standard.
I take that advice, but unfortunately it's not about my Python code, but about other people's Python code.
This is an application-level problem but not a server-side problem.
Zope itself gives a method for sorting strings: DocumentTemplate.sequence.sort. Many of the products relies on that for sorting. And that sorts UTF-8 incorrectly (I guess because locale.strcoll does it incorrectly). Also, ZCatalog sorts incorrectly (surely for the same reason), which is also the part of the standard Zope distribution.
Plone has UTF8 as default charset.
Believe me, I really hope I'm wrong. So how could I achieve that strings are sorted correctly? If it works for someone, how? (I have locale hu_HU.UTF-8 in zope.conf, I have even printed locale.getlocale(locale.LC_COLLATE) from products and external methods, and it was hu_HU.UTF-8. Note that at least on Python level sorting with hu_HU.ISO-8859-2 works... so I hope it would work with Plone as well.)
see above..Also the standard sort() methods of Python does not care about your locales (why should it)....strings are streams of bytes...nothing else...
I know, and I have referred to locale.strcoll, which does care about encoding and locale. Seems many products use that (indirectly) when they want to sort something.
sort() accepts a user-defined comparison method of implement user-specific sorting.
Yes, but this doesn't help, unless I write an UTF-8 comparison method, and then find all sort() and locale.sort() calls in Zope, Plone, and in other products, and patch them all...
And there are also methods in Python "locale" module to perform locale-dependent comparison.
Which I can't get working with UTF-8, it puts non-US-ASCII letters at the end of the list. Somebody did? How? I'm all ears. I guess the Plone site should suddenly sort correctly then, at least on the places where the programmer of the Zope product was wise enough not to use raw sort().
Once again: you must solve your problem on the application layer...
(Anyway string collation is not an application level problem in principle. It is the same for a book store application and for a first person shooter, there is nothing application specific in it. If Python is not mature enough to take this task, that's a different question.)
Zope does not help you at this point because it can't.
So however I formulate it, the end is that you *practically* can't use UTF-8 with Zope, unless you are using a language that doesn't use non-US-ASCII characters, in which case you don't utilize UTF-8. Hence, I said it is "not supported"... It doesn't mean that it is the mistake of Zope, it just means that you can't use it. So, back to the topic... Since UTF-8 is not working (it seems), how could I convert that already filled instance to use ISO-8859-2 instead of UTF-8? Some tool helps me in it done relatively easy?
-aj
-- Best regards, Daniel Dekany
Daniel Dekany wrote:
Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote:
Zope itself gives a method for sorting strings: DocumentTemplate.sequence.sort. Many of the products relies on that for sorting. And that sorts UTF-8 incorrectly
Then it will probably be easiest to just patch it up to sort correctly. Or file a bug in the collector. -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science
--On Sonntag, 24. April 2005 18:01 Uhr +0200 Max M <maxm@mxm.dk> wrote:
Daniel Dekany wrote:
Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote:
Zope itself gives a method for sorting strings: DocumentTemplate.sequence.sort. Many of the products relies on that for sorting. And that sorts UTF-8 incorrectly
Then it will probably be easiest to just patch it up to sort correctly. Or file a bug in the collector.
A candidate for a reject. I pointed out that the sort method can not smell your encoding. If you deal with encodings and deal with it in the right way but don't expect that the underlying framework can smell or guess what kind of encoding your application uses. Otherwise: use Python unicode strings *only* and *overall*. -aj
Andreas Jung wrote:
Daniel Dekany wrote:
Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote:
Zope itself gives a method for sorting strings: DocumentTemplate.sequence.sort. Many of the products relies on that for sorting. And that sorts UTF-8 incorrectly
Then it will probably be easiest to just patch it up to sort correctly. Or file a bug in the collector.
A candidate for a reject. I pointed out that the sort method can not smell your encoding. If you deal with encodings and deal with it in the right way but don't expect that the underlying framework can smell or guess what kind of encoding your application uses. Otherwise: use Python unicode strings *only* and *overall*.
Yes. I was thinking in the line of a monkeypatch of the sort method, that could be used in eg. Plone. It should be very possible for it to look up the charset under properties, and decode strings from that before sorting. -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science
--On Sonntag, 24. April 2005 18:26 Uhr +0200 Max M <maxm@mxm.dk> wrote:
Yes. I was thinking in the line of a monkeypatch of the sort method, that could be used in eg. Plone.
It should be very possible for it to look up the charset under properties, and decode strings from that before sorting.
That's implicit ugly magic. If you have to deal with unicode then do it in the proper way -> use unicode strings and don't fiddle around with utf8 encoded strings everywhere where you could and should using unicode strings. Believe me, it will improve your application and your life :-) -aj
--On Sonntag, 24. April 2005 17:45 Uhr +0200 Daniel Dekany <ddekany@freemail.hu> wrote:
Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote: First of all, in this thread I don't care whose mistake it is. My concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or not. Assume that I'm using a few non-US-ASCII characters, and I want sometimes show things alphabetically sorted...
You're not getting the point. As long as you handle with Python string and not with unicode strings then there is no way in Zope deal correctly with different kind of encodings...As I said...it is an application side problem. Zope and Python provide you the tools to deal with UTF8 but you need to solve such problems on in your application. That's my last comment on this issue :-) -aj
participants (3)
-
Andreas Jung -
Daniel Dekany -
Max M