On Wednesday 10 September 2003 15:46, Chris Withers wrote:
Trying again to bring it on list ;-)
Chris Withers wrote:
(bringing on-list in case others are interested)
Toby Dickenson wrote:
I've got some stuff that's in strings, so I guess not unicode, but which is UTF-8 encoded, and I'm wondering how I make sure Zope does "the right thing" here. Are there any docs about?
(and just to be clear, I'm using Zope 2.6.1 with ZODB 3.1, what differences will that make?)
Ive submitted a chapter to one of the books that Chris M maintains... last I looked it still wasnt merged :( There is some info at http://zope.org/Members/htrd/howto/unicode http://zope.org/Members/htrd/howto/unicode-zdg-changes
Just had a read of these, very interesting...
1. convert your strings to either unicode objects or latin-1, so that dtml or zpt can do the right thing when combining them. (Ive *still* not used zpt for this, but I assume it works).
I will be using ZPT for this, what changes did you make so that ZPT's return unicode strings?
I didnt, but I believe someone was reproducing my dtml semantics in ZPT. I forget who was working on this......
I recommend converting all language strings to unicode at the earliest opportunity as a general principal.
Hmmm, that's interesting. I'd been planning on keeping everything as UTF-8 encoded strings rather than actual unicode. What leads you to suggest storing everything as unicode?
Its a question of choosing the right data type to represent your data. Doesnt it make sense for string methods, character indexing, etc, to work on your data as a sequence of unicode characters? You wouldnt consider using an 8-bit string to store something that is logically an integer, simply because you originally read it from a file or socket in 8-bit string form. Why do the same to a unicode string? (perl programmers need not reply ;-)
2. set a 'Content-Type' header with the value 'text/html; charset=UTF-8' (or whatever you prefer, but anything other than utf8 has other complications) so that ZPublisher knows how to transmit the unicode response over http.
What are these complications? (luckily I'm going to be using UTF-8 ;-)
The rules for working out what encoding a browser will use when submitting a form are complicated, and depend on the encoding of the page that contained the form, POST/GET, and browser version. If your pages use UTF-8 then *all* form submissions come back in UTF-8. IMO its a no-brainer choice if you have forms (or might ever add one).
3. If there are http forms on those pages, you need to add extra marshalling tags so that ZPublisher knows what encoding your browser used when submitting the form.
If I do, do I then end up with unicode or strings encoded with the character set I specify?
You get to choose the right data type..... If you want to receive a unicode string from a form that will be submitted by the browser in utf8, then use <input name="description:utf8:ustring"..... If you want to receive a plain string containing latin-1 characters from a form that will be submitted by the browser in utf8, then use <input name="postcode:utf8:string"..... If you want to receive the bytes as the browser sent them over the wire: <input name="idontknowwhatthiswouldbefor:string".....
Finally, is ZCTextIndex compatible with either unicode or strings that contain UTF-8 encoding?
No idea. -- Toby Dickenson