ZPublisher: using zope.formlib and z3c.form in Zope 2
Hi! ZPublisher.Publish and zope.publisher.publish process form inputs differently. Zope 2 returns encoded strings unchanged if no converters are specified. zope.publisher converts encoded strings to unicode. One major reason why zope.formlib and z3c.form can't be used directly in Zope 2 is the fact they expect decoded form values. five.formlib uses special base classes and plone.z3cform monkey patches the base classes in z3c.form. Proposal: - Products.Five.browser.decode should be moved to ZPublisher. processInputs and setPageEncoding are publisher related code. - After traversal and before calling the object ZPublisher.Publish.publish should figure out if the object expects zope.publisher behavior. Either we use a new interface for this or we use zope.publisher.interfaces.browser.IBrowserPage: AFAICS in Zope 2 land only zope.formlib and z3c.form based views implement IBrowserPage. - If the object implements that interface, the request is post processed using processInputs and setPageEncoding. - plone.z3cform uses a modified version of processInputs and doesn't use setPageEncoding. Can anybody explain why? I guess that are no z3c.form specific reasons. Maybe the changes can be merged back to Zope? Does that make sense? I guess that kind of change should only be made on the trunk. Cheers, Yuppie
Hi, On 2 March 2011 08:43, yuppie <y.2011@wcm-solutions.de> wrote:
Hi!
ZPublisher.Publish and zope.publisher.publish process form inputs differently. Zope 2 returns encoded strings unchanged if no converters are specified. zope.publisher converts encoded strings to unicode.
One major reason why zope.formlib and z3c.form can't be used directly in Zope 2 is the fact they expect decoded form values. five.formlib uses special base classes and plone.z3cform monkey patches the base classes in z3c.form.
Proposal:
- Products.Five.browser.decode should be moved to ZPublisher. processInputs and setPageEncoding are publisher related code.
+1
- After traversal and before calling the object ZPublisher.Publish.publish should figure out if the object expects zope.publisher behavior. Either we use a new interface for this or we use zope.publisher.interfaces.browser.IBrowserPage: AFAICS in Zope 2 land only zope.formlib and z3c.form based views implement IBrowserPage.
Isn't this in zope.browserpage now?
- If the object implements that interface, the request is post processed using processInputs and setPageEncoding.
+1
- plone.z3cform uses a modified version of processInputs and doesn't use setPageEncoding. Can anybody explain why? I guess that are no z3c.form specific reasons. Maybe the changes can be merged back to Zope?
processInputs() in Five was very buggy. I thought I'd merged those changes into Zope 2? I don't know what setPageEncoding() does, though. Martin
Hi Martin! Martin Aspeli wrote:
- After traversal and before calling the object ZPublisher.Publish.publish should figure out if the object expects zope.publisher behavior. Either we use a new interface for this or we use zope.publisher.interfaces.browser.IBrowserPage: AFAICS in Zope 2 land only zope.formlib and z3c.form based views implement IBrowserPage.
Isn't this in zope.browserpage now?
No.
- plone.z3cform uses a modified version of processInputs and doesn't use setPageEncoding. Can anybody explain why? I guess that are no z3c.form specific reasons. Maybe the changes can be merged back to Zope?
processInputs() in Five was very buggy. I thought I'd merged those changes into Zope 2?
Well. You were the last person who touched both. But the changes are quit different: http://svn.zope.org/Zope/trunk/src/Products/Five/browser/decode.py?rev=11528... http://svn.zope.org/plone.z3cform/trunk/plone/z3cform/z2.py?rev=109071&view=... Is there still anything in plone.z3cform that should be merged into Zope 2?
I don't know what setPageEncoding() does, though.
It sets a response Content-Type header with the first charset processInputs tries for decoding. Cheers, Yuppie
On 2 March 2011 10:00, yuppie <y.2011@wcm-solutions.de> wrote:
Hi Martin!
Martin Aspeli wrote:
- After traversal and before calling the object ZPublisher.Publish.publish should figure out if the object expects zope.publisher behavior. Either we use a new interface for this or we use zope.publisher.interfaces.browser.IBrowserPage: AFAICS in Zope 2 land only zope.formlib and z3c.form based views implement IBrowserPage.
Isn't this in zope.browserpage now?
No.
- plone.z3cform uses a modified version of processInputs and doesn't use setPageEncoding. Can anybody explain why? I guess that are no z3c.form specific reasons. Maybe the changes can be merged back to Zope?
processInputs() in Five was very buggy. I thought I'd merged those changes into Zope 2?
Well. You were the last person who touched both. But the changes are quit different:
http://svn.zope.org/Zope/trunk/src/Products/Five/browser/decode.py?rev=11528... http://svn.zope.org/plone.z3cform/trunk/plone/z3cform/z2.py?rev=109071&view=...
Is there still anything in plone.z3cform that should be merged into Zope 2?
I don't know what setPageEncoding() does, though.
It sets a response Content-Type header with the first charset processInputs tries for decoding.
Is the charset of the request necessarily the right choice for the response? In Plone we always serve UTF-8 encoded. Laurence
Laurence Rowe wrote:
On 2 March 2011 10:00, yuppie<y.2011@wcm-solutions.de> wrote:
Martin Aspeli wrote:
I don't know what setPageEncoding() does, though.
It sets a response Content-Type header with the first charset processInputs tries for decoding.
Is the charset of the request necessarily the right choice for the response? In Plone we always serve UTF-8 encoded.
getPreferredCharsets()[0] always returns 'utf-8' **if** UTF-8 is accepted. If 'utf-8' is not in getPreferredCharsets(), it is not very likely that the browser speaks UTF-8 and processInputs will not even try to decode with UTF-8. In that case it might be better to respond with an accepted encoding. Cheers, Yuppie
Am 02.03.2011, 12:29 Uhr, schrieb yuppie <y.2011@wcm-solutions.de>:
getPreferredCharsets()[0] always returns 'utf-8' **if** UTF-8 is accepted.
If 'utf-8' is not in getPreferredCharsets(), it is not very likely that the browser speaks UTF-8 and processInputs will not even try to decode with UTF-8. In that case it might be better to respond with an accepted encoding.
I think you are drawing the wrong conclusion - some browsers (Internet Explorer and Safari spring to mind but this will also be the behaviour in Opera from 11.10) simply don't have an accept-charset header and the W3C says this means you can throw anything at them, in which case UTF-8 is a good choice. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
Hi Charlie! Charlie Clark wrote:
Am 02.03.2011, 12:29 Uhr, schrieb yuppie<y.2011@wcm-solutions.de>:
getPreferredCharsets()[0] always returns 'utf-8' **if** UTF-8 is accepted.
If 'utf-8' is not in getPreferredCharsets(), it is not very likely that the browser speaks UTF-8 and processInputs will not even try to decode with UTF-8. In that case it might be better to respond with an accepted encoding.
I think you are drawing the wrong conclusion
I did talk about getPreferredCharsets(), not about the Accept-Charset header.
some browsers (Internet Explorer and Safari spring to mind but this will also be the behaviour in Opera from 11.10) simply don't have an accept-charset header and the W3C says this means you can throw anything at them, in which case UTF-8 is a good choice.
You implemented this in getPreferredCharsets(), so that method says UTF-8 is accepted if no Accept-Charset header is set. And Five always used this line which has the same effect:: charsets = envadapter.getPreferredCharsets() or ['utf-8'] Cheers, Yuppie
On 2 March 2011 11:29, yuppie <y.2011@wcm-solutions.de> wrote:
Laurence Rowe wrote:
On 2 March 2011 10:00, yuppie<y.2011@wcm-solutions.de> wrote:
Martin Aspeli wrote:
I don't know what setPageEncoding() does, though.
It sets a response Content-Type header with the first charset processInputs tries for decoding.
Is the charset of the request necessarily the right choice for the response? In Plone we always serve UTF-8 encoded.
getPreferredCharsets()[0] always returns 'utf-8' **if** UTF-8 is accepted.
If 'utf-8' is not in getPreferredCharsets(), it is not very likely that the browser speaks UTF-8 and processInputs will not even try to decode with UTF-8. In that case it might be better to respond with an accepted encoding.
If you serve differently encoded pages then you should set Vary: Accept-Charset. But then without normalization you'd get an explosion of different page variations. Without the Vary, it means that a visitor can poison the cache by supplying (only) a weird charset in Accept-Encoding. The page would then be served in this encoding, cached downstream, and if other visitor's browsers don't support that charset then they have a problem. Laurence
Hi! Laurence Rowe wrote:
On 2 March 2011 11:29, yuppie<y.2011@wcm-solutions.de> wrote:
Laurence Rowe wrote:
On 2 March 2011 10:00, yuppie<y.2011@wcm-solutions.de> wrote:
Martin Aspeli wrote:
I don't know what setPageEncoding() does, though.
It sets a response Content-Type header with the first charset processInputs tries for decoding.
Is the charset of the request necessarily the right choice for the response? In Plone we always serve UTF-8 encoded.
getPreferredCharsets()[0] always returns 'utf-8' **if** UTF-8 is accepted.
If 'utf-8' is not in getPreferredCharsets(), it is not very likely that the browser speaks UTF-8 and processInputs will not even try to decode with UTF-8. In that case it might be better to respond with an accepted encoding.
If you serve differently encoded pages then you should set Vary: Accept-Charset.
That seems to be correct. So you found a bug in zope.publisher and five.formlib. If they do charset negotiation, they have to set Vary.
But then without normalization you'd get an explosion of different page variations.
AFAICS that normalization can't be done by the server and we can't prevent ineffective caching.
Without the Vary, it means that a visitor can poison the cache by supplying (only) a weird charset in Accept-Encoding. The page would then be served in this encoding, cached downstream, and if other visitor's browsers don't support that charset then they have a problem.
That sounds like charset negotiation isn't a good idea and neither zope.publisher nor five.formlib should do it. If we don't negotiate the charset, we should still have a setPageEncoding method that overrides the ZPublisher default_encoding with UTF-8. But what does all that mean for the processInputs methods in Five (used by five.formlib) and in plone.z3cform? If we always send UTF-8, their current implementation doesn't make much sense to me. Don't know if we really should try to fall back to all the charsets mentioned in Accept-Charset. But at least we should *always* try UTF-8 decoding first. Cheers, Yuppie
Am 04.03.2011, 08:58 Uhr, schrieb yuppie <y.2011@wcm-solutions.de>:
If we always send UTF-8, their current implementation doesn't make much sense to me. Don't know if we really should try to fall back to all the charsets mentioned in Accept-Charset. But at least we should *always* try UTF-8 decoding first.
Hiya, I'm not sure if this is directly related but I remember Withers having a discussion (alright, shouting match) with Andreas about cycling through all kinds of encoding possibilities on the resolver. I can't find the thread at the moment but I think it related to the way templates could be edited TTW or how to handle situations of mixed encoding. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
Charlie Clark wrote:
Am 04.03.2011, 08:58 Uhr, schrieb yuppie<y.2011@wcm-solutions.de>:
If we always send UTF-8, their current implementation doesn't make much sense to me. Don't know if we really should try to fall back to all the charsets mentioned in Accept-Charset. But at least we should *always* try UTF-8 decoding first.
I'm not sure if this is directly related but I remember Withers having a discussion (alright, shouting match) with Andreas about cycling through all kinds of encoding possibilities on the resolver. I can't find the thread at the moment but I think it related to the way templates could be edited TTW or how to handle situations of mixed encoding.
I considered to propose that we don't use the IUserPreferredCharsets adapter at all in Zope 2 and remove its registration in ZCML. But then I noticed the code Andreas wrote in Products.PageTemplates.unicodeconflictresolver.PreferredCharsetResolver. I'm not going to start that discussion again. Cheers, Yuppie
On Wed, Mar 2, 2011 at 9:43 AM, yuppie <y.2011@wcm-solutions.de> wrote:
Does that make sense? I guess that kind of change should only be made on the trunk.
Sounds all good to me, and yes this should be Zope trunk only. Hanno
Hi again! Based on the discussion I modified my proposal: - Products.Five.browser.decode should be marked as deprecated. It implements charset negotiation without making sure the 'Vary' header is set correctly. Fixing this will cause other caching issues. - The setPageEncoding() function will not be replaced at all. We just rely on HTTPResponse.setBody() if the 'Content-Type' header is not set explicitly by the view. It is recommended to set default-zpublisher-encoding to utf-8. This is how plone.z3cform currently handles this. - The processInputs() function is replaced by a HTTPRequest method called postProcessInputs(). This method first tries to decode with HTTPRequest.default_encoding. If that causes failures, it falls back to the encodings returned by getPreferredCharsets(). - Directly after traversal ZPublisher.Publish.publish() calls request.postProcessInputs() if the object implements zope.publisher.interfaces.browser.IBrowserPage. If there are no objections I'll implement it that way on Zope trunk. Cheers, Yuppie
participants (5)
-
Charlie Clark -
Hanno Schlichting -
Laurence Rowe -
Martin Aspeli -
yuppie