Re: [Zope] Zope 2.6.1 and UTF-8

10 Sep 2003

      On Wednesday 10 September 2003 15:46, Chris Withers wrote:
...
Trying again to bring it on list ;-)
Chris Withers wrote:
...
(bringing on-list in case others are interested)
Toby Dickenson wrote:
...
...
I've got some stuff that's in strings, so I guess not unicode, but
which is UTF-8 encoded, and I'm wondering how I make sure Zope does
"the right thing" here. Are there any docs about?
(and just to be clear, I'm using Zope 2.6.1 with ZODB 3.1, what
differences will that make?)
...
Ive submitted a chapter to one of the books that Chris M maintains...
last I looked it still wasnt merged :(
There is some info at
http://zope.org/Members/htrd/howto/unicode
http://zope.org/Members/htrd/howto/unicode-zdg-changes
Just had a read of these, very interesting...
...
1. convert your strings to either unicode objects or latin-1, so that
dtml or zpt can do the right thing when combining them. (Ive *still*
not used zpt for this, but I assume it works).
I will be using ZPT for this, what changes did you make so that ZPT's
return unicode strings?
I didnt, but I believe someone was reproducing my dtml semantics in ZPT. I 
forget who was working on this......
...
...
...
I recommend converting all language strings to unicode at the earliest
opportunity as a general principal.
Hmmm, that's interesting. I'd been planning on keeping everything as
UTF-8 encoded strings rather than actual unicode. What leads you to
suggest storing everything as unicode?
Its a question of choosing the right data type to represent your data. Doesnt 
it make sense for string methods, character indexing, etc, to work on your 
data as a sequence of unicode characters? 

You wouldnt consider using an 8-bit string to store something that is 
logically an integer, simply because you originally read it from a file or 
socket in 8-bit string form. Why do the same to a unicode string?    (perl 
programmers need not reply ;-)
...
...
...
2. set a 'Content-Type' header with the value 'text/html;
charset=UTF-8' (or whatever you prefer, but anything other than utf8
has other complications) so that ZPublisher knows how to transmit the
unicode response over http.
What are these complications?
(luckily I'm going to be using UTF-8 ;-)
The rules for working out what encoding a browser will use when submitting a 
form are complicated, and depend on the encoding of the page that contained 
the form, POST/GET, and browser version. If your pages use UTF-8 then *all* 
form submissions come back in UTF-8. IMO its a no-brainer choice if you have 
forms (or might ever add one).
...
...
...
3. If there are http forms on those pages, you need to add extra
marshalling tags so that ZPublisher knows what encoding your browser
used when submitting the form.
If I do, do I then end up with unicode or strings encoded with the
character set I specify?
You get to choose the right data type.....

If you want to receive a unicode string from a form that will be submitted by 
the browser in utf8, then use
<input name="description:utf8:ustring".....

If you want to receive a plain string containing latin-1 characters from a 
form that will be submitted by the browser in utf8, then use
<input name="postcode:utf8:string".....

If you want to receive the bytes as the browser sent them over the wire:
<input name="idontknowwhatthiswouldbefor:string".....
...
...
Finally, is ZCTextIndex compatible with either unicode or strings that
contain UTF-8 encoding?
No idea.

-- 
Toby Dickenson