[Zope-Coders] Analysis: BTrees and Unicode and Python
Guido van Rossum
guido@python.org
Fri, 19 Oct 2001 12:43:56 -0400
> ----- Original Message -----
> From: "Guido van Rossum" <guido@python.org>
> To: "Andreas Jung" <andreas@zope.com>
> Cc: "Jim Fulton" <Jim@zope.com>; <zope-coders@zope.org>
> Sent: Friday, October 19, 2001 11:52
> Subject: Re: [Zope-Coders] Analysis: BTrees and Unicode and Python
(Can you please edit out these headers from your replies? They are
only confusing, and not needed for the context.)
> > Note that this was a conscious design decision. Not all the world
> > uses Latin-1, and many real-world programs and data use different
> > interpretations of 8-bit characters with the high bit set. Assuming
> > Latin-1 when comparing to Unicode would be wrong.
>
> I assume the exception is raised before calling the PyUnicode_Compare
> function. Otherwise silently ignoring this error condition is also not
> a solution so I agree that Python behaviour is reasonable :)
I'm not sure I understand your question. PyUnicode_Compare() is
called when at least one of the arguments to a 3-way comparison is a
Unicode object. When the other is not, PyUnicode_FromObject() will
attempt to convert it to Unicode, and if it's an 8-bit string
containing non-ASCII characters, that will raise an exception, and
PyUnicode_Compare() will return -1. Then default_3_way_compare()
calls PyErr_Occurred() which will return true; the exception is a
ValueError so it doesn't match TypeError, so default_3_way_compare()
will return -2 to indicate an error, and the error will be propagated
all the way up to the caller of PyObject_Compare().
> > I'd like to see what's on the stack when default_3way_compare is
> > called with two Unicode objects.
>
> How can I determine that ?
I propose to change the code in default_3way_compare() as follows:
if (v->ob_type == w->ob_type) {
/* When comparing these pointers, they must be cast to
* integer types (i.e. Py_uintptr_t, our spelling of C9X's
* uintptr_t). ANSI specifies that pointer compares other
* than == and != to non-related structures are undefined.
*/
Py_uintptr_t vv = (Py_uintptr_t)v;
Py_uintptr_t ww = (Py_uintptr_t)w;
-----> if (PyUnicode_Check(v))
-----> abort();
return (vv < ww) ? -1 : (vv > ww) ? 1 : 0;
}
and then inspecting the stack trace with gdb.
If this abort() never happens, you need to look for a new theory. :-)
--Guido van Rossum (home page: http://www.python.org/~guido/)