[ZPT] Unicode and 8-bit string migration fix
Toby Dickenson
tdickenson at geminidataloggers.com
Fri Oct 17 03:26:49 EDT 2003
On Friday 17 October 2003 06:29, Stuart Bishop wrote:
> Starting with Zope 2.6, Zope became capable of publishing Unicode.
I was responsible for 90% of the unicode work in 2.6....
> However,
> Page Templates which mixed Unicode and 8-bit encoded strings would raise
> a Unicode exception:
At the time I wasnt using Page Templates, and Im still not using them
extensively enough to be sure what is best :-(
> However, because we are publishing HTML
> or XML documents, we can work around this problem by adding the
> following method to TALInterpreter.FasterStringIO:
>
> def getvalue(self):
> try:
> return StringIO.getvalue(self)
> except UnicodeDecodeError:
> utype = type(u'')
> self.buflist = [
> (type(b) is utype and
> b.encode('ascii','xmlcharrefreplace'))
> or b
> for b in self.buflist
> ]
> return StringIO.getvalue(self)
>
> This should have no effect on pages that currently render correctly,
> but it allows a way for Zope 2.6+ Unicode aware Products, Zope 2.5
> Unicode
The rule used in dtml for this type of template (that is, ones that mix
'string' and u'string' ) is to convert the plain strings into unicode
assuming latin-1, then for the whole template to return a unicode string.
(and there is a C-optimised function in the dtml library to do that)
The policy you suggest above (using character entity references) isnt one I
had considered before. It would not have worked for dtml, because dtml is
used to generate strings for contexts that dont understand xml character
entity references, such as javascript. Are Page Templates *always* used in
contexts where they are understood?
Is there a disadvantage to Page Templates mixing the same way as dtml? That
would have an advantage of simplicity of explanation for anyone using both
templating tools.
--
Toby Dickenson
More information about the ZPT
mailing list