Re: [Zope-dev] ZPublisher: using zope.formlib and z3c.form in Zope 2

4 Mar 2011

      Hi!

Laurence Rowe wrote:
...
On 2 March 2011 11:29, yuppie<y.2011@wcm-solutions.de>  wrote:
...
Laurence Rowe wrote:
...
On 2 March 2011 10:00, yuppie<y.2011@wcm-solutions.de>    wrote:
...
Martin Aspeli wrote:
...
I don't know what setPageEncoding() does, though.
It sets a response Content-Type header with the first charset
processInputs tries for decoding.
Is the charset of the request necessarily the right choice for the
response? In Plone we always serve UTF-8 encoded.
getPreferredCharsets()[0] always returns 'utf-8' **if** UTF-8 is accepted.
If 'utf-8' is not in getPreferredCharsets(), it is not very likely that
the browser speaks UTF-8 and processInputs will not even try to decode
with UTF-8. In that case it might be better to respond with an accepted
encoding.
If you serve differently encoded pages then you should set Vary:
Accept-Charset.
That seems to be correct. So you found a bug in zope.publisher and 
five.formlib. If they do charset negotiation, they have to set Vary.
...
But then without normalization you'd get an explosion
of different page variations.
AFAICS that normalization can't be done by the server and we can't 
prevent ineffective caching.
...
Without the Vary, it means that a visitor can poison the cache by
supplying (only) a weird charset in Accept-Encoding. The page would
then be served in this encoding, cached downstream, and if other
visitor's browsers don't support that charset then they have a
problem.
That sounds like charset negotiation isn't a good idea and neither 
zope.publisher nor five.formlib should do it.

If we don't negotiate the charset, we should still have a 
setPageEncoding method that overrides the ZPublisher default_encoding 
with UTF-8.

But what does all that mean for the processInputs methods in Five (used 
by five.formlib) and in plone.z3cform?

If we always send UTF-8, their current implementation doesn't make much 
sense to me. Don't know if we really should try to fall back to all the 
charsets mentioned in Accept-Charset. But at least we should *always* 
try UTF-8 decoding first.

Cheers,

	Yuppie