[Grok-dev] z3c.testsetup versus docfilesuite encoding

Wed Oct 21 08:37:14 EDT 2009

Hi there,

Theunis reply didn't make it to the list. I am quoting it therefore
comletely.

Christian Theune wrote:
> On 10/17/2009 01:03 PM, Uli Fouquet wrote:

> > Am Mittwoch, den 07.10.2009, 17:40 +0200 schrieb Christian Theune:
> > 
> >> I noticed some annoyances with z3c.testsetup WRT doctest files and encoding:
> >>
> >> - the default encoding is utf-8 and can not be turned to python's system
> >> default of None because of "or"-ing the optional parameter.
> > 
> > You could do::
> > 
> >   import sys
> >   testsuite = z3c.testsetup.register_all_tests(
> >     'mymod', encoding=sys.getdefaultencoding())
> 
> No, what I'm referring to is that doctest's internal default is None,
> which seems to avoid encoding/decoding alltogether (or something similar).

I wasn't aware that setting encoding to ``None`` skips decoding
completely. Thanks for the hint.

> sys.getdefaultencoding() would usually deliver 'ascii', not None. And
> even if it did, I could not pass it in. This is simply an issue of the
> testsetup API disabling behaviour of the original library due to
> shadowing issues.
> 
> A simple fix on your side would be to do
> 
> marker = object()
> 
> def register_all_tests(... encoding=object):
>    if encoding is marker:
>        ...
> 
> That would allow None as a valid argument.

Thanks, this will go into the next bugfix release.

> >> - the encoding is only applied to functional docfile suites, but not the
> >> ones that are unit tests
> > 
> > I think that should be fixed with the latest release.
> 
> Thanks.
> 
> >> - can the default please be the same as Python?
> > 
> > My experience is that most people that worry about encodings use
> > 'utf-8'. And they often expect 'utf-8' to be handled out-of-the-box.
> > Getting back to Python default encoding would most probably break many
> > tests and could confuse beginners (I see, that there are still many more
> > encoding-related problems with testrunners and doctests).
> > 
> > As Python in general is moving towards complete 'utf-8-defaultness', I
> > don't see the point here. Maybe you want to explain your usecase?
> > 
> >> Why does one care about that encoding anyway?
> > 
> > Uh? For example to handle umlauts. A usecase quite common in
> > internationalized apps. With the encoding set to 'utf-8' you can do
> > 
> >   >>> myvar = u'ö'
> >   >>> myvar
> >   u'\xf6'
> > 
> > which is not nice, but gives at least a bit of encoding support (for
> > example ``print myvar`` would not work, as the doctest output parser
> > seems still to expect the Python default encoding). Without setting the
> > encoding this doctest would not be accepted by the testrunner at all
> > (except you set the Python default encoding to 'utf-8') and leave
> > beginners with a cryptic error message not really related to their
> > testcase.
> > 
> > As all this is not news to you, I wonder whether I missed your point. Is
> > there a better way to handle encoded strings in doctests?
> 
> My point is: whatever doctest does by default already works. I need to
> revalidate this with the example you gave above, though.

Rechecked this. One difference is, when passing for instance 'utf-8'
encoding, the following doctest will work:

  >>> u'ä'
  u'\xe4'

while with encoding set to ``None`` it gives:

  >>> u'ä'
  u'\xc3\xa4'

(but it works, contrary to my first assumption). I am too little into
encodings to say what's better. My feeling is, that switching back to
the default of doctest module (``None``) has some advantages. Maybe
someone with more encoding experience can tell?

Best regards,

-- 
Uli

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
Url : http://mail.zope.org/pipermail/grok-dev/attachments/20091021/5b17213d/attachment.bin