[Grok-dev] Re: Understanding unicode
Philipp von Weitershausen
philipp at weitershausen.de
Sun Sep 23 10:45:13 EDT 2007
On 22 Sep 2007, at 19:48 , Jan Ulrich Hasecke wrote:
> We looked into all places and finally found the place where the
> values in my choice list is encoded to ASCII
>
> In zope.schema vocabulary.py
>
> there is:
>
> def __init__(self, value, token=None, title=None):
> """Create a term for value and token. If token is omitted,
> str(value) is used for the token. If title is provided,
> term implements ITitledTokenizedTerm.
> """
> self.value = value
> if token is None:
> token = value
> self.token = str(token)
> self.title = title
> if title is not None:
> directlyProvides(self, ITitledTokenizedTerm)
>
> self.token = str(token) converts the values from my Choice list to
> ASCII, so that there is an error when there are non unicode strings
> in the values like u'Paviankäfig'
>
> if you change the line to
>
> self.token = unicode(token)
>
> it works.
>
> Please have a look at this solution, maybe there are side effects.
> But I hope that this is a good solution.
No it's not. Sorry.
The str(token) is there for a reason. The vocabulary spec says that
tokens should be *ASCII*. Not unicode. Not 8bit strings. Just ASCII.
So ideally, str(token) should always work. The problme is that one
line above that it says "token = value", therefore ruining the whole
str(token) line.
What this code should really do is
- check if a one-to-one mapping between the values' types and ASCII
can be arranged. It can be for all integers, floats, and pure-ASCII
strings. To support the whole unicode range, UTF-7 would have to be
used.
- if there are objects that can't be mapped to an ASCII
representation (e.g. arbitrary objects), then a *useful* error
message should be shown indicating that a vocabulary/source shoudl be
used instead.
More information about the Grok-dev
mailing list