[Zope-dev] Re: [Archetypes-devel] Unicode in Zope 2 (ZMI, Archetypes, Plone, Formulator)

Martijn Faassen faassen at infrae.com
Mon Apr 26 18:24:58 EDT 2004


David Convent wrote:
> Hi Bjorn,
> 
> I always believed that unicode and utf-8 were same encoding, but reading 
> you let me think i was wrong.
> Can you tell me what the difference is between unicode and utf-8 ?

Unicode should not be seen as an encoding as such. While Python 
internally uses an encoding for unicode strings (which are the strings 
that if you represent them python will add a 'u' in front of them), you 
shouldn't care about what that is, and Python can in fact be recompiled 
to use another.

UTF-8 is one particular way to represent unicode data, in this case as 8 
bit strings. UTF-8 happens to be popular for two (related) reasons:

   * since UTF-8 includes ASCII, ASCII is automatically UTF-8 and UTF-8 
without a lot of special characters looks like ASCII.

   * Software that can deal with 8 bit strings can usually deal with UTF-8.

Anyway, in my experience most programmers have only a vague grasp of 
encoding issues. The basics are in Python not that hard to understand, but:

   * Python is not very educational if you do it wrong; you basically 
get weird errors

   * you get weird errors frequently in a different place in the code 
than where you made them, when some code is trying to combine unicode 
strings with classic strings.

   * you can 'hack' your way around it and survive for a long time. You 
don't notice the problem as it works with the test text which happens to 
be ascii. Etc.

Regards,

Martijn



More information about the Zope-Dev mailing list