[ZPT] Unicode and 8-bit string migration fix

Sun Oct 19 02:40:06 EDT 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday, October 17, 2003, at 11:35  PM, Barry Warsaw wrote:

> On Fri, 2003-10-17 at 01:29, Stuart Bishop wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi.
>>
>> Starting with Zope 2.6, Zope became capable of publishing Unicode.
>> However,
>> Page Templates which mixed Unicode and 8-bit encoded strings would 
>> raise
>> a Unicode exception:
>>
>> 	<p tal:content="python:u'My 2\N{CENT SIGN}'" />
>>      <p tal:content="python:u'My 2\N{CENT SIGN}'.encode('latin1')" />
>
> I think this is probably the basic problem.  Why are you encoding this
> unicode string here in your zpt?  I'm pretty sure everything coming out
> of tal should just be unicode strings, and components such as the
> publisher should encode for the output stream as appropriate or needed.
> That retains the separation of concerns, and helps to maintain 
> developer
> sanity.

This is just a minimal demonstration of the problem, which occurs in
real life when you are mixing modern Zope Products which uses Unicode
strings, and Zope code that works under Zope 2.5.

The product I'm dealing with that is doing this is Formulator. The code 
I
originally proposed provides a way of making things work that currently
involve people having to sprinkle by page templates with 
python:foo.decode('utf8') or similar. It will also mean that my 
documents
will not need to be rewritten when Formulator starts spitting out 
Unicode.

Fixing Formulator for everyone is non trivial, as it is an integral part
of other products (eg. Silva, and I've seen other large commercial 
systems
in production that have it tightly coupled).

Alternatives for me are to fix Formulator output for just my project
(fairly simple), work with Infrae to fix Formulator properly (more
complex, and will break legacy code).

> I think the basic philosophy ought to be: convert to and from unicode 
> at
> the farthest boundaries you can get away with, treat everything
> internally as unicode always, and never mix unicode and 8-bit strings.

Yup. My original proposed patch is more a migration tool for people who
don't have the luxury of reworking their entire Zope 2.5 codebase to
Unicode, or who need to support both pre and post Unicode Zopes.
Downsides I can see:
	- Like most migration tools, it might encourage bad behavior as
	  things just work until people do something really stoopid like
	  mixing encodings.
	- Might hide some errors where someone forgot to marshal their
       form submission into Unicode (although these are already hidden
	  unless the developer remembered to test with some high-bit characters
       in the input).
     - We might want this behavior to go away if Python gets a fully
       Unicode aware cStringIO and it is faster than Zope's 
FasterStringIO.
	- Doesn't help people using ZPT to generate non HTML/XML fragments,
	  such as CSS style sheets. Although they are no worse off, as they
	  can't mix encoded/Unicode strings today anyway.

Would a better approach be to make the use of encoded strings in
tal:replace or tal:content attributes explicit, such as:
	<b tal:content="here/whatever" tal:encoding="latin1">Foo</b>
For my situation, I prefer my original suggestion as none of my 
templates
will need to be changed if Formulator starts spitting out Unicode.

(Hmmm..... must check out to see if Zope3 marshals form submissions into
Unicode by default... I think Unicode and encodings will always be a
problem with Zope2...)

- -- 
Stuart Bishop <stuart at stuartbishop.net> ☞ http://www.stuartbishop.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (Darwin)

iD8DBQE/kjHLAfqZj7rGN0oRAgHcAJ9cdYsYyRC5dsRjWT/xjH7kueDp3QCgjLrn
kleABMEEtnzPPStIJhqoicU=
=Xf4p
-----END PGP SIGNATURE-----