[Zope-dev] ANN: Proposal ContentNegotiation

Toby Dickenson tdickenson@geminidataloggers.com
Fri, 4 May 2001 09:39:59 +0100


> -----Original Message-----
> From: Andreas Jung [mailto:andreas@andreas-jung.com]
> Sent: 03 May 2001 20:23
> To: Toby Dickenson
> Cc: zope-dev
> Subject: Re: [Zope-dev] ANN: Proposal ContentNegotiation
> 
> 
> 
> From: "Toby Dickenson" <tdickenson@geminidataloggers.com>
> To: "'Andreas Jung'" <andreas@andreas-jung.com>
> Sent: Thursday, May 03, 2001 1:04 PM
> Subject: RE: [Zope-dev] ANN: Proposal ContentNegotiation
> 
> 
> > Your proposal suggests that published objects should 
> generally return
> > pre-encoded objects in 8-bit strings. This is the one 
> detail which in my
> > experience looks *wrong* and is very much dependant on my proposal.
 
> Not really - I only say that the internal character set is currently
> ascii/iso-8859-1.
> I did not mention that this will be the default character set 
> in the future.

Its not a question of the 'default' encoding if (as in your proposal) the
published object explicity specifies the character encoding of a pre-encoded
object.

Im referring to the section where you say:

: If the object creates a RESPONSE in another character set
: than the internal character set it must indicate this by
: setting an attribute of the RESPONSE object to the used
: character set e.g.:

:       RESPONSE.charset = 'utf-8'    (either as attribute or by a method
call)

I think we are mixing up 'character set' and 'character encoding' here.
The character set of html (and xml) is the Unicode set.
utf-8 is a character encoding, not a character set.
Given the semantic of this proposed attribute, the name 'character_encoding'
would be more appropriate.

In my experience, forcing published methods to deal with character encoding
is a bad idea (I know because I tried that in my first iteration). Your
proposal suggests that methods return 8-bit strings, specifying the
character encoding of that string in the RESPONSE. This makes it hard to
combine methods from two different components that happen use different
character encodings.

Note that since many components already use latin-1, the usefulness of this
aspect proposal is low.

If they need a wider character set then using Unicode is the only practical
approach.

Ill write up a full proposal for how this would fit together on the wiki.

(On a different subject, I think the default encoding will always have to
stay as latin1 in order to support pre-existing object databases )