[Zope-dev] [ZOPE 2.6 B1] Unicode/locale problems withOFS/dtml/properties.dtml

Toby Dickenson tdickenson@geminidataloggers.com
Fri, 27 Sep 2002 07:35:46 +0100


> [2] This line change from 'iso-8859-1' to 'utf-8'
> lib/python/App/dtml/manage_page_header.dtml
> <dtml-call "REQUEST.set('management_page_charset','iso-8859-1')">

Bad news. That will cause all management form submissions to encode strin=
gs in=20
utf8. 99% of the methods to which the strings are being submitted will no=
t be=20
expecting this, and will corrupt characters whose unicode code point is >=
127.

If you have a ZMI form that *is* expecting this then you need to make som=
e=20
other changes to avoid breakage. Essentially just adding :utf8: marshalli=
ng=20
tags, possibly some :strings: and :ustring: too.=20

(yes, this sucks. The problem is that browsers dont specify the character=
=20
encoding used in form submissions. At some point we need to discuss the w=
ay=20
forward on this issue....)


> [1] These 3 filses (total 3 line) change
>    from "encode('latin1')" to "encode('utf-8')"
> $find . -name '*.py' -exec grep -l 'encode.*latin1' {} \;
> ./lib/python/ZPublisher/Converters.py
> ./lib/python/ZPublisher/HTTPRequest.py
> ./lib/python/ZPublisher/HTTPResponse.py

Even more bad news. Suppose a dtml page it not yet prepared to handle uni=
code=20
(because it hasnt had the changes described above) but it 'accidentally'=20
encounters a unicode attribute. This happens often in the ZMI when object=
s=20
have a unicode 'title', because many products render the title attribute =
of=20
*other* objects. We cant force the response to utf8 because this will cau=
se=20
the same breakage described above.


For more details see:

http://www.zope.org/Members/htrd/howto/unicode
http://www.zope.org/Members/htrd/howto/unicode-zdg-changes


> But I have not enough test.

obviously. ;-)

> I want mechanism to change encoding dinamically.

the manage_properties page that Arnar Lundesgaard has been working with i=
s a=20
good example. It switches between latin-1 and utf-8 automatically dependi=
ng=20
on whether any unicode properties have been defined. (to support *really*=
 old=20
browsers that dont understand utf8)