[Zope-CMF] Charsets

yuppie y.2009 at wcm-solutions.de
Sun Jan 18 17:00:24 EST 2009


Hi Charlie!


Charlie Clark wrote:
> Am 29.12.2008 um 15:01 schrieb Charlie Clark:
> 
> CMFDefault.utils
> 
> def getBrowserCharset(request):
>      """ Get charset preferred by the browser.
>      """
>      envadapter = IUserPreferredCharsets(request)
>      charsets = envadapter.getPreferredCharsets() or ['utf-8']
>      return charsets[0]
> 
> This will always be iso-8859-1 for Opera and Firefox because all  
> charsets have the same quality, again even if UTF-8 encoding is  
> specified.

getBrowserCharset does almost the same as 
zope.publisher.http.getCharsetUsingRequest. And it is only used for 
encoding and decoding 'portal_status_message'. It is not relevant for 
the issue you noticed.

> I haven't been able to track where the decoding of form  
> data occurs for Zope 2 stuff but I can identify the problem in  
> zpublisher.browser.BrowserRequest

You mean zope.publisher.browser.BrowserRequest. The Zope 2 version is in 
Products.Five.browser.decode.

>      def _decode(self, text):
>          """Try to decode the text using one of the available  
> charsets."""
>          if self.charsets is None:
>              envadapter = IUserPreferredCharsets(self)
>              self.charsets = envadapter.getPreferredCharsets() or  
> ['utf-8']
>          for charset in self.charsets:
>              try:
>                  text = unicode(text, charset)
>                  break
>              except UnicodeError:
>                  pass
>          return text
> 
> Here the naive assumption is that we decode from a charset without an  
> error then we have the correct charset. Sometimes this goes unnoticed  
> but with characters like u2013 and u2014 (en-dash and em-dash) it will  
> raise errors as those codepoints are not in the Latin-1 charset but it  
> has it's own equivalents.

AFAICS the fallback to other charsets is usually not required in Zope 3. 
If the publisher encodes responses using 
zope.publisher.http.getCharsetUsingRequest, the first charset will be 
the right one.

> I would suggest that we work towards enforcing UTF-8 in where possible  
> but at the very least add the accept-charset attribute to forms and  
> use the portal's default_charset for this.
> 
> I'd very much appreciate your comments on this.

I can't see a need to implement this in a different way than Zope 3. So 
I propose to fix the encoding of forms sent to the browser.


Cheers,

	Yuppie




More information about the Zope-CMF mailing list