-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andreas Jung wrote:
Hi,
I've created a branch
svn+ssh://svn.zope.org/repos/main/Zope/branches/ajung-zpt-encoding-fixes
that should fix several encoding and webdav issues that unfortunately were discovered before the final 2.10 release.
What has changed?
Until Zope 2.10 the ZopePageTemplate class had no notion of an encoding. You could upload ascii, iso-8859-15, utf8...or whatever...everything has been stored as a Python byte string. For backward compatibility we did not convert the data into Python unicode strings with the introduction of the Zope 3 ZPT implementation. Although the ZopePageTemplate implementation already had a 'strict' flag (to enforce the conversion to Python unicode) the Zope 2 implementation did not enforce unicode as storage for a ZPT. This caused several encoding problems when editing a ZPT through the ZMI and in addition the WebDAV support was broken in Zope 2.10.0 and Zope 2.10.1.
A ZPT has now something as an output_encoding. When you create a ZPT through the ZMI you'll be asked about the encoding (which is utf8 by default). The pt_render() method now converts the internal unicode representation back to the output encoding. This is basically the behavior of the old ZPT implementation. In addition the __call__() method sets the 'charset' property of the content-type header according to the configured output encoding.
That is dangerous, because a page template may be called without being the "main" driver for a request; the response encoding should be used, if already set, rather than the value set on the template.
WebDAV: the PUT factory was using the write() method to store uploaded content. This method wasn't aware of the output encoding. PUT() now uses pt_edit(). This implies that the uploaded content must have the same encoding as the output encoding. Means: when you create a ZPT with encoding UTF-8 you can't upload new content with a different encoding. This is a slightly different behavior from older Zope versions and might break backward compatibility. Anyone having such a usecase? One might check in addition for the 'encoding' attribute inside the XML preamble or for the 'charset' property inside a <meta http-equiv="content-type" ..> tag for HTML documents.
PUT should always extract the encoding from the upload request, and use it to decode the template to unicode for storage. While saving that encoding as the "outpout encoding" for a newly-created template is reasonable, modifying the "output encoding" for an existing template is riskier. I'm not even sure there is a usecase for a per-template output encoding -- it seems more like a site-wide policy to me (perhaps configured as a view?)
Any objections merging the changes on the head after further polishing and writing some more test?
Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFhX3C+gerLs4ltQ4RAgA+AJ9HnkSxrN6DgiY1jujxSGXJ324E2QCgjstU gQdBYp9Uay/+tDApM+uPYNI= =27Or -----END PGP SIGNATURE-----