[ZPT] Unicode and 8-bit string migration fix

Fri Oct 17 03:26:49 EDT 2003

On Friday 17 October 2003 06:29, Stuart Bishop wrote:
> Starting with Zope 2.6, Zope became capable of publishing Unicode.

I was responsible for 90% of the unicode work in 2.6....

> However,
> Page Templates which mixed Unicode and 8-bit encoded strings would raise
> a Unicode exception:

At the time I wasnt using Page Templates, and Im still not using them 
extensively enough to be sure what is best :-(

> However, because we are publishing HTML
> or XML documents, we can work around this problem by adding the
> following method to TALInterpreter.FasterStringIO:
>
>      def getvalue(self):
>          try:
>              return StringIO.getvalue(self)
>          except UnicodeDecodeError:
>              utype = type(u'')
>              self.buflist = [
>                  (type(b) is utype and
> b.encode('ascii','xmlcharrefreplace'))
>                          or b
>                          for b in self.buflist
>                  ]
>              return StringIO.getvalue(self)
>
> This should have no effect on pages that currently render correctly,
> but it allows a way for Zope 2.6+ Unicode aware Products, Zope 2.5
> Unicode

The rule used in dtml for this type of template (that is, ones that mix 
'string' and u'string' ) is to convert the plain strings into unicode 
assuming latin-1, then for the whole template to return a unicode string.

(and there is a C-optimised function in the dtml library to do that)

The policy you suggest above (using character entity references) isnt one I 
had considered before. It would not have worked for dtml, because dtml is 
used to generate strings for contexts that dont understand xml character 
entity references, such as javascript. Are Page Templates *always* used in 
contexts where they are understood?

Is there a disadvantage to Page Templates mixing the same way as dtml? That 
would have an advantage of simplicity of explanation for anyone using both 
templating tools.

-- 
Toby Dickenson