[zope2-tracker] [Bug 530620] [NEW] UnicodeDecodeError when using IE, Safari

Tue Mar 2 06:35:00 EST 2010

Public bug reported:

Using Zope 2.11.5, default-zpublisher-encoding utf-8, rendering content
fails in IE and Safari, as they (at the time of writing) doesn't provide
the Accept-Charset header, if the content contains a string in utf-8.

In http.py (zope/publisher/http.py), the
HTTPCharsets.getPreferredCharsets() method returns an empty list,
causing a UnicodeDecodeError in zope, when a tal:content string contains
utf-8 encoded string with fi. norwegian characters (ø > \xc3\xb8).

I made a simple test, just a default page template, giving it a title with such a character (fi. Pølse):
<html>
  <head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8">
  </head>
  <body>
    <tal:block content="python:repr(template.title)" /><br />
    <tal:block content="python:repr(template.title.encode('latin-1'))" /><br />
    <tal:block content="python:repr(template.title.encode('utf-8'))" /><br />
    <tal:block content="python:title" define="title python:template.title" /><br />
    <tal:block content="python:title" define="title python:template.title.encode('utf-8')" /><br />
  </body>
</html>

In Firefox the output is fine:
u'P\xf8lse'
'P\xf8lse'
'P\xc3\xb8lse'
Pølse
Pølse

In IE and Safari it raises a UnicodeDecodeError

If HTTPCharsets.getPreferredCharsets() returns ['utf-8'], it works fine
in IE and Safari as well.

My changes to http.py:
from zope.publisher.base import RequestDataGetter
+from ZPublisher import Converters

...

        # Quoting RFC 2616, $14.2: If no "*" is present in an Accept-Charset
        # field, then all character sets not explicitly mentioned get a
        # quality value of 0, except for ISO-8859-1, which gets a quality
        # value of 1 if not explicitly mentioned.
        # And quoting RFC 2616, $14.2: "If no Accept-Charset header is
        # present, the default is that any character set is acceptable."
        if not sawstar and not sawiso88591 and header_present:
- charsets.append((1.0, 'iso-8859-1'))
+ charsets.append((1.0, Converters.default_encoding))
        # UTF-8 is **always** preferred over anything else.
        # Reason: UTF-8 is not specific and can encode the entire unicode
        # range , unlike many other encodings. Since Zope can easily use very
        # different ranges, like providing a French-Chinese dictionary, it is
        # always good to use UTF-8.
        charsets.sort(sort_charsets)
        charsets = [charset for quality, charset in charsets]
- if sawstar and 'utf-8' not in charsets:
+ if not sawstar and 'utf-8' not in charsets: # IS THIS BAD, TO FORCE IN UTF-8???
            charsets.insert(0, 'utf-8')

The question is then, is this a problem, forcing utf-8 here (or the
default-zpublisher-encoding) when the HTTP_ACCEPT_CHARSET is missing in
the request?

** Affects: zope2
     Importance: Undecided
         Status: New

-- 
UnicodeDecodeError when using IE, Safari
https://bugs.launchpad.net/bugs/530620
You received this bug notification because you are a member of Zope 2
Developers, which is subscribed to Zope 2.