[ZPT] Unicode and 8-bit string migration fix

Toby Dickenson tdickenson at geminidataloggers.com
Fri Oct 17 05:19:42 EDT 2003


On Friday 17 October 2003 08:53, Tino Wildenhain wrote:
> Hi Toby,
>
> Toby Dickenson wrote:
> > On Friday 17 October 2003 06:29, Stuart Bishop wrote:
>
> ...
>
> > The rule used in dtml for this type of template (that is, ones that mix
> > 'string' and u'string' ) is to convert the plain strings into unicode
> > assuming latin-1, then for the whole template to return a unicode string.
>
> ...
>
> What if I type utf-8 into the template? This is very common since

Its common, but only because we havent had better solutions in the past.

> lets me input for different languages
> without switching encoding.

In my opinion this is a matter of choosing the right data type for the task.

If you want to store different languages in a variable in a python program, 
(in this case, as the text of a template) then that variable's type should be 
unicode.

storing utf-8 in an 8-bit string means that anyone using that string has to 
guess the encoding. This makes it unnecessarily hard for anyone supporting 
zope components, because different components might use different encodings 
in 8-bit strings. This problem goes away if using unicode objects.

> it is supported by browsers and 

Sure, that fits into my scheme too. I added extra marshalling tags to 
ZPublisher to support it. We use utf-8 [1] over the wire, and ZPublisher 
converts this to/from unicode objects at Zopes outer interface. The intention 
is that everything inside Zope uses unicode objects. (Appologies if this 
sounds like I am repeating myself)

> The assumption of latin-1 is very dangerous and brings us in a position
> similar to the anybody-uses-ascii assumption as it was before.

we had to pick one, and stick to it consistently in ZPublisher and dtml. 
latin-1 was the choice, and you can assume that reliably today. Note that 
anyone who uses unicode for all there human language strings (as I advocate) 
will be unaffected by this choice.

> Could be content-type with the extension charset used for display
> and evaluation of the encoding?

On Friday 17 October 2003 09:07, Stuart Bishop wrote:

>  It also ties TALInterpreter more closely to
> ZPublisher, which may not be wise (?).

I think thats a fatal flaw. It assumes that templates are never used for 
anything other than computing a whole ZPublisher response.

Suppose a Page Template is used to compute a page fragment that is combined 
with others using dtml. this will only do the right thing if:
1. the template returns a unicode object.
2. the template returns a latin-1 plain string. (reasonable only for trivial 
ones)
3. the dtml doesnt happen to encounter any other unicode object. (which is 
certain to be true on a site developed before zope 2.6, but will become 
increasingly less common in future.)


[1] or some other encoding on a page-by-page basis.

-- 
Toby Dickenson




More information about the ZPT mailing list