[Zope3-dev] Re: unicode problems !?
Bjorn Tillenius
bjoti777 at student.liu.se
Tue Oct 12 14:38:57 EDT 2004
On Tue, Oct 12, 2004 at 03:27:24PM +0200, Martijn Faassen wrote:
> Hey,
>
> I'm not sure I understood the entire debate, but I'll summarize what I
> think should be happening:
>
> * if a user edits a textarea, then assume the encoding of form submit is
> that of the presented form, or alternatively generate some explicit
> encoding setting in the form, as we previously discussed on this list.
> The default for this encoding in Zope should be UTF-8. Contents that is
> saved is decoded from UTF-8 and stored as unicode. In my experience
> browsers, including IE, do submit form data in the same encoding as the
> way the form was presented; we rely on this heavily in Silva, for
> instance. Silva uses unicode internally throughout.
When the form value is read from the request, a unicode string is
returned. That part works today, assuming that all browsers do
"the right thing". The question is how the value should be stored.
> * if a user uploads a file in some way, and the file is intended to be
> textual data, then the encoding of this file is assumed to be UTF-8 by
> default. However the user can specify an encoding to override this. The
> textual data is decoded using this encoding, and stored as unicode. If
> the decoding fails, then the user needs to be presented with an error.
> We have some experience implementing something like this in Silva, where
> we provide a Comma Separated Value object (in the SilvaExternalSources
> extension). Users explicitly specify the encoding of the uploaded CSV
> data here, and data is stored as unicode.
Always storing the value as unicode is one quite good option, or UTF-8
so that we can use the same Byte field no matter what the content type
is. It's probably the easiest solution since we only have to care about
the encoding when a user uploads a file. Then we don't have to add
another attribute to the File class. One disadvantage is that if I
upload a file using a specific encoding, I might want that encoding to
be used when the file is downloaded. Of course, I guess I could provide
a special view for that. I like this option better than adding an extra
attribute.
Although one thing. If we choose to store the text data as UTF-8 we
should either set the encoding of the response, or decode the data to
unicode before it's being passed to the response. I don't think we do
that today, or does anyone know?
> * if a user uploads a file and this file is *not* intended to be textual
> data but binary data, then Zope doesn't do a thing, and just stores the
> bytes. If the developer still uses this data as text at any stage, they
> should be aware of encoding issues and decode in whatever encoding they
> see fit. Of course the developer is better off using a stored text file
> in that case, where unicode is already guaranteed.
Agreed. And if he chooses to change the content type to text/*, the same
thing as when he uploads a text file should happen.
Regards,
Bjorn
More information about the Zope3-dev
mailing list