[ZODB-Dev] [BTrees] Inconsistent equality checks
Tim Peters
tim at zope.com
Thu Nov 6 16:38:11 EST 2003
[Dieter Maurer]
> I learned yesterday that "BTrees" and friends ("Buckets", "Sets",
> "TreeSets") do not define the standard equality check for
> mappings/sets (but compare by identity).
At the time they were written, "rich comparisons" didn't exist in Python,
meaning that it was impossible to define the result of any comparison
operator without defining __cmp__, which in turn implies needing (for
sanity) to define a consistent total ordering. Comparison by identity is a
total ordering, and Jim stopped with that.
As I mentioned before, Python 2.3's Set type defines a partial ordering with
useful semantics for ==, !=, <, <=, >, and >=, but refuses to allow Set
comparison via cmp() (it raises an exception if you try; I gave an example
before). This was *possible* because it exploits the newer ability to
define each of the six infix comparison operators differently (that's what I
mean by "rich comparisons" above).
None of that was possible when Jim created the BTree-based types.
> Looking for an alternative, I met the following inconsitency:
>
> a= OOSet((1,2,3))
> b= OOSet((1,2,3))
> a.keys() == b.keys()
> --> 1
>
> a= OOTreeSet((1,2,3))
> b= OOTreeSet((1,2,3))
> a.keys() == b.keys()
> --> 0
>
> We have the same inconsistency for "OOBucket", "OOBTree" and
> "items".
The result type of keys() has to implement IReadSequence, but the
IReadSequence interface doesn't define anything about comparability. What
you've determined here is that a Python list (returned by Bucket and Set
keys/items/values) implements a different notion of comparison than does a
BTreeItems object (returned by BTree and TreeSet keys/items/values).
When ZODB 3.3 is released, (all) BTree-based objects will participate in the
newer Python iteration protocol, and then a memory-efficient alternative
becomes possible; e.g., and untested:
def eq(x, y):
gety = iter(y).next
try:
for xe in x:
ye = gety() # may raise StopIteration
if xe != ye:
return False
except StopIteration:
return False # y ended early
# x exhausted, and all equal so far
try:
gety()
except StopIteration:
return True # y also exhausted
else:
return False # x is a prefix of y
That should work with any two iterable objects; e.g., compare an OITreeSet
with the keys in an OOBucket, or with the elements of a Python list, or even
the lines in a file), and consumes constant memory regardless of the types
of x and y.
It's possible that someone will decide to implement rich comparisons for
BTree-based data structures too, although it's not on my todo list and is
unlikely to show up there. Anybody who wants to should be aware that it's a
change in semantics, so will face opposition for that reason. Comparison by
identity isn't useful often, so maybe nobody will object.
More information about the ZODB-Dev
mailing list