[Zope-dev] Re: [Archetypes-devel] Unicode in Zope 2 (ZMI,
Archetypes, Plone, Formulator)
Martijn Faassen
faassen at infrae.com
Mon Apr 26 18:24:58 EDT 2004
David Convent wrote:
> Hi Bjorn,
>
> I always believed that unicode and utf-8 were same encoding, but reading
> you let me think i was wrong.
> Can you tell me what the difference is between unicode and utf-8 ?
Unicode should not be seen as an encoding as such. While Python
internally uses an encoding for unicode strings (which are the strings
that if you represent them python will add a 'u' in front of them), you
shouldn't care about what that is, and Python can in fact be recompiled
to use another.
UTF-8 is one particular way to represent unicode data, in this case as 8
bit strings. UTF-8 happens to be popular for two (related) reasons:
* since UTF-8 includes ASCII, ASCII is automatically UTF-8 and UTF-8
without a lot of special characters looks like ASCII.
* Software that can deal with 8 bit strings can usually deal with UTF-8.
Anyway, in my experience most programmers have only a vague grasp of
encoding issues. The basics are in Python not that hard to understand, but:
* Python is not very educational if you do it wrong; you basically
get weird errors
* you get weird errors frequently in a different place in the code
than where you made them, when some code is trying to combine unicode
strings with classic strings.
* you can 'hack' your way around it and survive for a long time. You
don't notice the problem as it works with the test text which happens to
be ascii. Etc.
Regards,
Martijn
More information about the Zope-Dev
mailing list