UnicodeDecodeError in Zope 2.10.4 (upgrade from 2.8.4)
I've upgraded my installation from Zope2.8.4 to Zope 2.10.4 (by copying data.fs, Products/ etc.). I have publisher encoding and management_page_charset set to utf-8. Also system default encoding is utf-8. Zope 2.10 is said to have better Unicode support with UnicodeEncodeConflictResolver. It is but unfortunatelly in some cases this is not enough. Seems that code that is 'protected' by Resolver is not the only code that may be affected by non unicode strings. Simple example is with 'structure' keyword. eg: This works (resolver resolves conflict): <em tal:content="python: 'żółć'">template id</em>. This doesn't work: <em tal:content="structure python: 'żółć'">template id</em>. Also if you have Folder instance and set it's Title to the string that contains some i18n characters you're not able to even add page template inside it. Error traceback in both cases is same: Error Type: UnicodeDecodeError Error Value: 'ascii' codec can't decode byte 0xc5 in position 200: ordinal not in range(128) Traceback (innermost last): Module ZPublisher.Publish, line 119, in publish Module ZPublisher.mapply, line 88, in mapply Module ZPublisher.Publish, line 42, in call_object Module Shared.DC.Scripts.Bindings, line 313, in __call__ Module Shared.DC.Scripts.Bindings, line 350, in _bindAndExec Module Products.PageTemplates.PageTemplateFile, line 129, in _exec Module Products.PageTemplates.PageTemplate, line 89, in pt_render Module zope.pagetemplate.pagetemplate, line 117, in pt_render Module zope.tal.talinterpreter, line 271, in __call__ Module zope.tal.talinterpreter, line 346, in interpret Module zope.tal.talinterpreter, line 534, in do_optTag_tal Module zope.tal.talinterpreter, line 516, in no_tag Module zope.tal.talinterpreter, line 346, in interpret Module zope.tal.talinterpreter, line 754, in do_insertStructure_tal UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 200: ordinal not in range(128) problem is with: text = unicode(structure) at Module zope.tal.talinterpreter, line 754, in do_insertStructure_tal sitecustomize.py with setdefaultencoding('utf-8') solves this but it is not nice solution. I wonder whether this should be submitted as a bug, or maybe there is different solution that I've missed? For the record: one more thing that was wrong for me during migration was 'expand' attribute of ZPT that in case of few ZPT objects was set to true in 2.10.4 while oryginally in 2.8.4 it was false. This caused that tales expressions disappeared from ZPTs under 2.10.4. I've written a script that explicitly set expand=0 to all ZPT instances in Zope 2.8.4 and then, after migration, everything is OK. -- Maciej Wisniowski
--On 17. Juli 2007 10:33:39 +0200 Maciej Wisniowski <maciej.wisniowski@coig.katowice.pl> wrote:
I've upgraded my installation from Zope2.8.4 to Zope 2.10.4 (by copying data.fs, Products/ etc.). I have publisher encoding and management_page_charset set to utf-8. Also system default encoding is utf-8
Did you read <http://www.zope.org/Products/Zope/2.10.4/Zope-2.10.4-released> ? -aj
--On 17. Juli 2007 10:52:05 +0200 Andreas Jung <lists@zopyx.com> wrote:
--On 17. Juli 2007 10:33:39 +0200 Maciej Wisniowski <maciej.wisniowski@coig.katowice.pl> wrote:
I've upgraded my installation from Zope2.8.4 to Zope 2.10.4 (by copying data.fs, Products/ etc.). I have publisher encoding and management_page_charset set to utf-8. Also system default encoding is utf-8
Did you read
<http://www.zope.org/Products/Zope/2.10.4/Zope-2.10.4-released>
If the documented hints don't work, please submit a bugreport with a reproducable testcase in form of a unittest. -aj
If the documented hints don't work, please submit a bugreport with a reproducable testcase in form of a unittest. Bug submitted with tests attached at: http://www.zope.org/Collectors/Zope/2339
-- Maciej Wisniowski
Did you read
<http://www.zope.org/Products/Zope/2.10.4/Zope-2.10.4-released>
Yes.
The migration code should auto-detect ISO-8859-15 and UTF-8 encoded page templates. For other encodings you must set the environment variable ZPT_PREFERRED_ENCODING. The migration code applies to ZopePageTemplate instances only. I have utf-8 encoded ZPTs so I didn't used ZPT_PREFERRED_ENCODING
When you download ZPT content through FTP or WebDAV the content is converted using the output_encoding property of the corresponding ZopePageTemplate instance. So far I didn't even try ftp or webdav with 2.10.4. The problems are with TTW.
The encoding of a rendered ZPT (through HTTP) is determined through charset=XXXX within the content-type HTTP response header etc/zope.conf: default-zpublisher-encoding default: iso-8859-15 default-zpublisher-encoding is set to utf-8
In order to deal with UnicodeDecodeErrors in a reasonable way, we added a configurable Unicode conflict resolver. As I said before resolver is not even called when using 'structure' keyword.
-- Maciej Wisniowski
Both <em tal:content="python: 'żółć'">template id</em>. and <em tal:content="structure python: 'żółć'">template id</em> is like playing with fire. Don't do it. What you've got there is unicode characters written down without any encoding information. It will work if you set the internal ZPT encoding to be the same as it was entered into the template which I can see is not ASCII which is what your system defaults to. This is internally what happens:
x="\xc5" unicode(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128) import sys; sys.getdefaultencoding() 'ascii'
Maciej Wisniowski wrote:
I've upgraded my installation from Zope2.8.4 to Zope 2.10.4 (by copying data.fs, Products/ etc.). I have publisher encoding and management_page_charset set to utf-8. Also system default encoding is utf-8.
Zope 2.10 is said to have better Unicode support with UnicodeEncodeConflictResolver. It is but unfortunatelly in some cases this is not enough. Seems that code that is 'protected' by Resolver is not the only code that may be affected by non unicode strings. Simple example is with 'structure' keyword.
eg: This works (resolver resolves conflict): <em tal:content="python: 'żółć'">template id</em>.
This doesn't work: <em tal:content="structure python: 'żółć'">template id</em>.
Also if you have Folder instance and set it's Title to the string that contains some i18n characters you're not able to even add page template inside it.
Error traceback in both cases is same:
Error Type: UnicodeDecodeError Error Value: 'ascii' codec can't decode byte 0xc5 in position 200: ordinal not in range(128)
Traceback (innermost last): Module ZPublisher.Publish, line 119, in publish Module ZPublisher.mapply, line 88, in mapply Module ZPublisher.Publish, line 42, in call_object Module Shared.DC.Scripts.Bindings, line 313, in __call__ Module Shared.DC.Scripts.Bindings, line 350, in _bindAndExec Module Products.PageTemplates.PageTemplateFile, line 129, in _exec Module Products.PageTemplates.PageTemplate, line 89, in pt_render Module zope.pagetemplate.pagetemplate, line 117, in pt_render Module zope.tal.talinterpreter, line 271, in __call__ Module zope.tal.talinterpreter, line 346, in interpret Module zope.tal.talinterpreter, line 534, in do_optTag_tal Module zope.tal.talinterpreter, line 516, in no_tag Module zope.tal.talinterpreter, line 346, in interpret Module zope.tal.talinterpreter, line 754, in do_insertStructure_tal UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 200: ordinal not in range(128)
problem is with: text = unicode(structure) at Module zope.tal.talinterpreter, line 754, in do_insertStructure_tal
sitecustomize.py with setdefaultencoding('utf-8') solves this but it is not nice solution. I wonder whether this should be submitted as a bug, or maybe there is different solution that I've missed?
For the record: one more thing that was wrong for me during migration was 'expand' attribute of ZPT that in case of few ZPT objects was set to true in 2.10.4 while oryginally in 2.8.4 it was false. This caused that tales expressions disappeared from ZPTs under 2.10.4. I've written a script that explicitly set expand=0 to all ZPT instances in Zope 2.8.4 and then, after migration, everything is OK.
-- Peter Bengtsson, work www.fry-it.com home www.peterbe.com hobby www.issuetrackerproduct.com
Both <em tal:content="python: 'żółć'">template id</em>. and <em tal:content="structure python: 'żółć'">template id</em> is like playing with fire. Don't do it.
What you've got there is unicode characters written down without any encoding information. This is legacy code from zope 2.8.4 that I have to deal with during migration. All new code is supposed to use u'' strings. In fact, above is only an example but in real case we have properties set for folders, eg. 'title' that contains national characters.
You can easily check it yourself if you want. Just create a Folder and set it's 'title' property to one that contains some unicode characters. You'll not be even able to add ZPT object into that folder I think (at last it is not possible for me).
It will work if you set the internal ZPT encoding to be the same as it was entered into the template which I can see is not ASCII which is what your system defaults to.
What do you mean by internal ZPT encoding? Docs says: "Starting Zope 2.10.2 the ZPT implementation uses unicode as internal representation" so there should be no encoding at this level, I think.
This is internally what happens:
x="\xc5" unicode(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128) import sys; sys.getdefaultencoding() 'ascii'
I already realized same thing and used sitecustomize.py with sys.setdefaultencoding('utf-8') as a quick fix. This works but is not nice solution. I rather expected Zope 2.10 resolver to deal with this. -- Maciej Wisniowski
participants (3)
-
Andreas Jung -
Maciej Wisniowski -
Peter Bengtsson