RE: [Zope-dev] [ZOPE 2.6 B1] Unicode/locale problems with OFS/dtml/properties.dtml
what do you mean by "inserted into the form"?
Put there by the dtml-method in a dtml-var statement.
That the response contains a single byte where you properties contain character whose unicode value is greater than 127 ?
Yes
how have you checked this? if so, thats a bug.
In a Hex editor... The character 'æ' for instance is inserted as '0xE6' in the returned HTML.
Secondly, the type attribute of all inputfields contain an extra ':utf8:' that we assumed is a server directive to interpret the contents as UTF-8. This apperently what crashes when storing the second time.
This is a directive to tell zope when you submit the form that your browser will have encoded the form response using utf-8. Browsers stupidly dont put this information anywhere more suitable.
Yeah, we figured this would be similar to ':method'.
What browser are you using? is it correctly using utf8 for this page? (for example, Mozilla has a View/Encoding menu that can override server-supplied encoding information)
We have so far tried Opera 6, IE 6 and Mozilla 1.1.
lib/python/OFS/dtml/properties.dtml contains the following that seems to us to be debug code: <dtml-call "REQUEST.set('management_page_charset','UTF-8')"> <dtml-var "u''"> and several ':utf8:' directives.
When we removed those, it worked fine. Was this dtml-method merged in mistakenly
Those are supposed to be there. The first inserts the text/html;charset=utf-8 header into the management page. The second ensures that the dtml which computes this page content returns a unicode object.
If the dtml returns a unicode object, then ZPublisher looks at the charset header to determine how to encode it.
This is DTML/ZOPE internal, right? I though the first two letters to signify endianness 'FFFE' or 'FEFF', tells you that this is UTF-8. The u'' string is not on the firs line.
please put ib/python/OFS/dtml/properties.dtml back the way is was originally, then send me
1. which browser you are using
I am using Opera 6.02, but we are seeing it on all browsers.
2. a export file containing one object that demonstrates the problem.
It is on every property page in the whole site, but see the attached .zexp of /standard_template.pt
3. a copy of the page obtained using wget or similar. (please dont use your browsers 'save' feature because that sometimes performs transcoding)
I used wget... -- Arnar Lundesgaard
On Thursday 26 Sep 2002 5:47 pm, Arnar Lundesgaard wrote: Thanks for taking the time to help debug this.
lib/python/OFS/dtml/properties.dtml contains ... <dtml-var "u''">
This ensures that the dtml which computes this page content returns a unicode object.
This line, intended to force the dtml to be rendered as a unicode object, is not doing its job. If the dtml is not a unicode object then ZPublisher's encoding mechanism is not engaged. Has there been some recent dtml optimisation to ignore empty strings perhaps? This patch below works for me. Please let me know if this works for you and I will apply it to 2.6. (and various sources of documentation that all recommend including this line.) Index: properties.dtml =================================================================== RCS file: /cvs-repository/Zope/lib/python/OFS/dtml/properties.dtml,v retrieving revision 1.8 diff -c -2 -r1.8 properties.dtml *** properties.dtml 7 May 2002 17:54:56 -0000 1.8 --- properties.dtml 26 Sep 2002 17:54:48 -0000 *************** *** 1,4 **** <dtml-call "REQUEST.set('management_page_charset','UTF-8')"> ! <dtml-var "u''"> <dtml-var manage_page_header> <dtml-with "_(management_view='Properties')"> --- 1,4 ---- <dtml-call "REQUEST.set('management_page_charset','UTF-8')"> ! <dtml-var "u' '"> <dtml-var manage_page_header> <dtml-with "_(management_view='Properties')">
Am Don, 2002-09-26 um 18.47 schrieb Arnar Lundesgaard:
what do you mean by "inserted into the form"?
Put there by the dtml-method in a dtml-var statement. I can second this. With CVS-Zope (did the last cvs up this moment) I'm getting a very curios thing: Displaying .../index_html is ok. But return context.index_html(context,request) creates broken characters instead is isolatin1 Umlaute. In my case (Konqueror on Linux) it seems that the text/html;charset=UTF-8 breaks the page because the byte values are correct for the "Umlaute". This is further confirmed by the fact that forcing Konq to display iso8859-1 fixes the display.
So how are these Unicode changes supposed to work? Are non-ascii characters forbidden now? And how do I get UTF-8 text into Zope? While I'm quite sure that this will help Zope in the Asiatic region, it seems quite inconvienent for isolatin1 world :(
That the response contains a single byte where you properties contain character whose unicode value is greater than 127 ?
Yes
how have you checked this? if so, thats a bug.
In a Hex editor...
The character 'æ' for instance is inserted as '0xE6' in the returned HTML.
The same here with Umlaute, ... I've used wget and less <result to verify that the umlaute display correctly. Andreas -- Andreas Kostyrka <andreas@kostyrka.priv.at>
Andreas Kostyrka <andreas@kostyrka.priv.at> wrote:
So how are these Unicode changes supposed to work? Are non-ascii characters forbidden now? And how do I get UTF-8 text into Zope?
If all your code outputs is plain python strings, ZPublisher passes them as-is to the client. If ZPublisher has to output a Unicode string, it has to decide how to translate that into a byte string at the other end. What it does then is encode the Unicode string into the charset defined in any 'Content-Type: text/xxx; charset=thecharset' header you produced using RESPONSE.setHeader (defaulting to latin-1). But how does ZPublisher get a Unicode string in the first place? Well it gets it from the rendering of whatever method was called when publishing the object. For DTML, various blocks are joined together (function render_blocks()), and if one of them happens to be Unicode then the join_unicode method will make it so that all non-Unicode string are converted into Unicode using unicode(s, 'latin-1'). So this assumes that plain strings are encoded in latin-1. Note, WE MAY WANT TO PARAMETRIZE THIS. Basically there could be an additional attribute to the DTML saying what's its native encoding. For PageTemplates, the various blocks produced by the template and python are sent to an StringIO-like objects, which is responsible for converting them into a coherent thing when its getvalue() method is called. At the moment it doesn't deal very well mixed Unicode and non-Unicode strings so the reported failures don't surprise me. WE NEED TO FIX THIS BEFORE THE NEXT BETA, probably also by providing an explicit native encoding. I believe that's what AltPT does. Localizer 0.9, for instance, had the need to patch the StringIO-like object to make it deal with joining non-Unicode and Unicode. Now that I better understand the problem, I'll help fix this ASAP in core Zope. Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
For PageTemplates, the various blocks produced by the template and python are sent to a StringIO-like object, which is responsible for converting them into a coherent thing when its getvalue() method is called. At the moment it doesn't deal very well mixed Unicode and non-Unicode strings so the reported failures don't surprise me.
BTW an example of a failing PageTemplate is: <html> <span tal:replace="python:u'hello'" /> café </html> Because, deep inside StringIO, it tries to do something like: ''.join([u'hello', ' café']) -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
Am Don, 2002-09-26 um 23.58 schrieb Florent Guillaume:
Andreas Kostyrka <andreas@kostyrka.priv.at> wrote:
So how are these Unicode changes supposed to work? Are non-ascii characters forbidden now? And how do I get UTF-8 text into Zope?
If all your code outputs is plain python strings, ZPublisher passes them as-is to the client. Well, my index_html produces a plain string with 8bit characters. (I've verified this by trying to add u'' to it, and got an exception)
Now in some cases ZPublisher adds the UTF-8 content-type and some it don't. Directly calling (via browser) index_html does not mark the content as UTF-8. return context.index_html(context,request) marks the content up as UTF-8 although index_html does return a plain old string. And it does nothing to recode the string in UTF-8, ... Andreas -- Andreas Kostyrka <andreas@kostyrka.priv.at>
On Friday 27 Sep 2002 10:16 am, Andreas Kostyrka wrote:
Now in some cases ZPublisher adds the UTF-8 content-type
To a response header? I fairly sure ZPublisher never does that. Perhaps you could add some debugging hooks to RESPONSE.setHeader to see who is?
return a plain old string. And it does nothing to recode the string in UTF-8, ...
ZPublisher never recodes strings. If you return a plain string, then exactly those same bytes will go out over the wire. ZPublisher only performs encoding if you return a unicode string.
Am Fre, 2002-09-27 um 11.30 schrieb Toby Dickenson:
On Friday 27 Sep 2002 10:16 am, Andreas Kostyrka wrote:
Now in some cases ZPublisher adds the UTF-8 content-type
To a response header? I fairly sure ZPublisher never does that. Well, someone does. I do not :)
Perhaps you could add some debugging hooks to RESPONSE.setHeader to see who is? I'll look into that.
Andreas -- Andreas Kostyrka <andreas@kostyrka.priv.at>
Hi! I've traced back the source of my UTF-8 + plain string problem. My python script calls manage_changeProperties like this: request = container.REQUEST RESPONSE = request.RESPONSE standorte=request['standorte'].replace(', ',',') context.manage_changeProperties({'standorte':standorte}) return context.index_html(context,request) Now if I comment out the manage_changeProperties, it works ok, but of course does not change the property. Andreas Following are the exact tracebacks I've generated: Am Fre, 2002-09-27 um 12.54 schrieb Andreas Kostyrka:
Am Fre, 2002-09-27 um 11.30 schrieb Toby Dickenson:
On Friday 27 Sep 2002 10:16 am, Andreas Kostyrka wrote:
Now in some cases ZPublisher adds the UTF-8 content-type
To a response header? I fairly sure ZPublisher never does that. Well, someone does. I do not :)
Perhaps you could add some debugging hooks to RESPONSE.setHeader to see who is? I'll look into that. Well, I've added a traceback.print_stack like this: Index: HTTPResponse.py =================================================================== RCS file: /cvs-repository/Zope/lib/python/ZPublisher/HTTPResponse.py,v retrieving revision 1.70 diff -u -u -r1.70 HTTPResponse.py --- HTTPResponse.py 24 Sep 2002 22:13:26 -0000 1.70 +++ HTTPResponse.py 27 Sep 2002 14:37:49 -0000 @@ -17,6 +17,7 @@
import types, os, sys, re import zlib, struct +import traceback from string import translate, maketrans from types import StringType, InstanceType, LongType, UnicodeType from BaseResponse import BaseResponse @@ -240,6 +241,8 @@ return name = literal and name or key self.headers[name] = value + if name.upper()=="CONTENT-TYPE" and value.find("UTF")<>-1: + traceback.print_stack() def addHeader(self, name, value): '''\ It produced: File "/home/andreas/Zope/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__ response=response) File "/home/andreas/Zope/lib/python/ZPublisher/Publish.py", line 150, in publish_module response = publish(request, module_name, after_list, debug=debug) File "/home/andreas/Zope/lib/python/ZPublisher/Publish.py", line 98, in publish request, bind=1) File "/home/andreas/Zope/lib/python/ZPublisher/mapply.py", line 88, in mapply if debug is not None: return debug(object,args,context) File "/home/andreas/Zope/lib/python/ZPublisher/Publish.py", line 39, in call_object result=apply(object,args) # Type s<cr> to step into published object. File "/home/andreas/Zope/lib/python/Shared/DC/Scripts/Bindings.py", line 252, in __call__ return self._bindAndExec(args, kw, None) File "/home/andreas/Zope/lib/python/Shared/DC/Scripts/Bindings.py", line 283, in _bindAndExec return self._exec(bound_data, args, kw) File "/home/andreas/Zope/lib/python/Products/PythonScripts/PythonScript.py", line 315, in _exec result = apply(f, args, kw) File "Script (Python)", line 9, in setStandorte File "/home/andreas/Zope/lib/python/OFS/PropertyManager.py", line 289, in manage_changeProperties return self.manage_propertiesForm(self,REQUEST,manage_tabs_message=message) File "/home/andreas/Zope/lib/python/Shared/DC/Scripts/Bindings.py", line 252, in __call__ return self._bindAndExec(args, kw, None) File "/home/andreas/Zope/lib/python/Shared/DC/Scripts/Bindings.py", line 283, in _bindAndExec return self._exec(bound_data, args, kw) File "/home/andreas/Zope/lib/python/App/special_dtml.py", line 174, in _exec try: result = render_blocks(self._v_blocks, ns) File "/home/andreas/Zope/lib/python/Shared/DC/Scripts/Bindings.py", line 266, in __render_with_namespace__ return self._bindAndExec((), namevals, namespace) File "/home/andreas/Zope/lib/python/Shared/DC/Scripts/Bindings.py", line 283, in _bindAndExec return self._exec(bound_data, args, kw) File "/home/andreas/Zope/lib/python/App/special_dtml.py", line 174, in _exec try: result = render_blocks(self._v_blocks, ns) File "/home/andreas/Zope/lib/python/DocumentTemplate/DT_Util.py", line 201, in eval return eval(code, d) File "<string>", line 0, in ? File "/home/andreas/Zope/lib/python/ZPublisher/HTTPResponse.py", line 245, in setHeader traceback.print_stack() The related PythonScript does a return context.index_html(context,request) at this place. Further investigation asserts that the value returned from this expression is a plain text with 8bit characters in it. In fact adding u'' to it breaks of this. Andreas -- Andreas Kostyrka <andreas@kostyrka.priv.at>
On Fri, 2002-09-27 at 16:42, Andreas Kostyrka wrote:
I've traced back the source of my UTF-8 + plain string problem. My python script calls manage_changeProperties like this: context.manage_changeProperties({'standorte':standorte})
You should use context.manage_changeProperties(standorte=standorte) otherwise manage_changeProperties thinks (stupidly) that the mapping that was passed is a REQUEST, and tries to return the management page, whose rendering sets the UTF-8 header. There is a lot of braindeadness like that in this old code (abuse of REQUEST), but we have to deal with it. :-( Florent
File "Script (Python)", line 9, in setStandorte File "/home/andreas/Zope/lib/python/OFS/PropertyManager.py", line 289, in manage_changeProperties return self.manage_propertiesForm(self,REQUEST,manage_tabs_message=message)
-- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
Am Fre, 2002-09-27 um 17.17 schrieb Florent Guillaume:
On Fri, 2002-09-27 at 16:42, Andreas Kostyrka wrote:
I've traced back the source of my UTF-8 + plain string problem. My python script calls manage_changeProperties like this: context.manage_changeProperties({'standorte':standorte})
You should use
context.manage_changeProperties(standorte=standorte)
otherwise manage_changeProperties thinks (stupidly) that the mapping that was passed is a REQUEST, and tries to return the management page, whose rendering sets the UTF-8 header. I thought manage_editProperties is here to deal with the ZMI?
There is a lot of braindeadness like that in this old code (abuse of REQUEST), but we have to deal with it. :-( Well, either one has two APIs (for web and internal use) or we will have to live with things like this. :( [I do not see a way to differ between this two, as most internal use is triggered by some web request, ...]
Andreas -- Andreas Kostyrka <andreas@kostyrka.priv.at>
participants (4)
-
Andreas Kostyrka -
Arnar Lundesgaard -
Florent Guillaume -
Toby Dickenson