[ZPT] Unicode and 8-bit string migration fix
Toby Dickenson
tdickenson at geminidataloggers.com
Fri Oct 17 05:19:42 EDT 2003
On Friday 17 October 2003 08:53, Tino Wildenhain wrote:
> Hi Toby,
>
> Toby Dickenson wrote:
> > On Friday 17 October 2003 06:29, Stuart Bishop wrote:
>
> ...
>
> > The rule used in dtml for this type of template (that is, ones that mix
> > 'string' and u'string' ) is to convert the plain strings into unicode
> > assuming latin-1, then for the whole template to return a unicode string.
>
> ...
>
> What if I type utf-8 into the template? This is very common since
Its common, but only because we havent had better solutions in the past.
> lets me input for different languages
> without switching encoding.
In my opinion this is a matter of choosing the right data type for the task.
If you want to store different languages in a variable in a python program,
(in this case, as the text of a template) then that variable's type should be
unicode.
storing utf-8 in an 8-bit string means that anyone using that string has to
guess the encoding. This makes it unnecessarily hard for anyone supporting
zope components, because different components might use different encodings
in 8-bit strings. This problem goes away if using unicode objects.
> it is supported by browsers and
Sure, that fits into my scheme too. I added extra marshalling tags to
ZPublisher to support it. We use utf-8 [1] over the wire, and ZPublisher
converts this to/from unicode objects at Zopes outer interface. The intention
is that everything inside Zope uses unicode objects. (Appologies if this
sounds like I am repeating myself)
> The assumption of latin-1 is very dangerous and brings us in a position
> similar to the anybody-uses-ascii assumption as it was before.
we had to pick one, and stick to it consistently in ZPublisher and dtml.
latin-1 was the choice, and you can assume that reliably today. Note that
anyone who uses unicode for all there human language strings (as I advocate)
will be unaffected by this choice.
> Could be content-type with the extension charset used for display
> and evaluation of the encoding?
On Friday 17 October 2003 09:07, Stuart Bishop wrote:
> It also ties TALInterpreter more closely to
> ZPublisher, which may not be wise (?).
I think thats a fatal flaw. It assumes that templates are never used for
anything other than computing a whole ZPublisher response.
Suppose a Page Template is used to compute a page fragment that is combined
with others using dtml. this will only do the right thing if:
1. the template returns a unicode object.
2. the template returns a latin-1 plain string. (reasonable only for trivial
ones)
3. the dtml doesnt happen to encounter any other unicode object. (which is
certain to be true on a site developed before zope 2.6, but will become
increasingly less common in future.)
[1] or some other encoding on a page-by-page basis.
--
Toby Dickenson
More information about the ZPT
mailing list