[Zope-dev] zope.publisher and ZPublisher: decoding form input

Mon Mar 7 03:58:46 EST 2011

Hi!

As discussed in a different thread, zope.publisher compatible decoding 
should be added to the ZPublisher.

But does that code from zope.publisher make any sense?

     def _decode(self, text):
         """Try to decode the text using one of the available charsets."""
         if self.charsets is None:
             envadapter = IUserPreferredCharsets(self)
             self.charsets = envadapter.getPreferredCharsets() or ['utf-8']
         for charset in self.charsets:
             try:
                 text = unicode(text, charset)
                 break
             except UnicodeError:
                 pass
         return text

Using getPreferredCharsets()[0] is correct because zope.publisher uses 
the same charset for encoding responses. (For ZPublisher we decided we 
don't want to support charset negotiation.) But what about the other 
charsets?

AFAICS

- There are no tests in zope.publisher for that fallback behavior.

- That fallback behavior doesn't cause trouble because it is very rarely 
or never used.

- The fact no error is raised by unicode(text, charset) doesn't mean we 
have the right charset. Here some background information: 
http://chardet.feedparser.org/docs/index.html

- Returning the encoded strings if all attempts fail might not be the 
best choice.

Proposal:

Just use unicode(text, charset, 'replace') with the same charset used 
for encoding responses.

If there are no objections, I'll implement it that way in ZPublisher.

What about zope.publisher? I don't use zope.publisher, but I think it 
should always use 'utf-8' instead of trying to be smart.

Cheers,

	Yuppie