[Zope-dev] RFV: Unicode in Zope 2

Tue Dec 13 09:46:17 EST 2005

Martijn Faassen wrote:
> Jim Fulton wrote:
> 
>> I forgot a very important need:
>>
>> - Common approach to Unicode
>>
>> In particular, In Zope 3, all text is stored and managed as Unicode.
>> The publisher decodes request data and encodes response data.  The vast
>> majority of application and library code can ignore encoding issues.
>> (The exceptions are applications and frameworks that need to exhange
>> text with non-Unicode-aware external systems.)  This has provided
>> great simplifications and allowed us to avoid common pitfals from
>> mixing Unicode and encoded text.
>>
>> We need to migrate Zope 2 to use a similar strategy.  We need volunteers
>> to brainstorm how this can be done and make one or more proposals.
>> This is likely a prerequisite for finishing the publisher and ZPT
>> work.
> 
> 
> This is definitely a scary topic, and I speak from years of experience 
> with Zope 2 unicode here. This sounds like a very hard transition that 
> would touch *a lot* of code in non-Zope 2 core. How do you envision all 
> the form inputs to suddenly produce unicode strings, for instance?
> 
> We've struggled hard with Formulator to make it work with unicode for 
> instance (and still it's buggy, as I wanted to support the non-unicode 
> scenarios too). I can imagine any system in Zope that uses forms at all 
> would need to be touched.
> 
> I'll volunteer to help brainstorm on this, but right now my brainstorm 
> is only very dark and full of lightning.

You and I brainstormed this a few months ago.  I think this was on the
list.  I think that, for starters, we would arrange that all Zope 3
views used in Zope 2 would get unicode input.  If you like, I can try
to find this discussion. :)

> Anyway, in some basics, Zope 2 does have an approach to unicode for 
> *output* that's fairly similar to Zope 3's: if you use unicode strings 
> your entire output (including page templates) will be unicode (if you 
> don't mix with non-unicode non-ascii strings..). Then the response 
> encoding setting is read and everything is transformed once to unicode 
> text. Silva uses this. It also struggles to make sure all its input is 
> transformed to unicode (among other ways using Formulator).
> 
> In Plone, the situation is quite different -- its 
> PlacelessTranslationService monkeypatches into the page template engine 
> and puts in ways so that you can mix UTF-8 and unicode strings together. 
> This then goes on to break assumptions of code that uses the page 
> template engine in a unicode-pure environment (which is what happened to 
> Silva).

Ick.

I'm not suggesting this is easy.  We may have some messy deprecation
and backward compatibility code.  But we *do* need to solve this problem
eventually, and the solution doesn't get any closer without taking steps.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org