[ZODB-Dev] Updated BTree docs

Fri May 2 11:54:39 EDT 2003

Casey Duncan]
> I think I can elaborate on the following passage in the docs:
>
>    "... and byValue(), which should probably be ignored (it's hard to
>     explain exactly what it does, and as a result it's almost never
>     used - best  to consider it deprecated). "
>
> byValue() returns (value, key) pairs in sorted order by value.

If that were true, I wouldn't have a problem explaining what it does <wink>.
Examples to ponder:

>>> from BTrees.IIBTree import IIBTree
>>> from BTrees.OOBTree import OOBTree
>>> data = [(i, 2*i) for i in range(-5, 5)]
>>> data
[(-5, -10), (-4, -8), (-3, -6), (-2, -4), (-1, -2), (0, 0),
 (1, 2), (2, 4), (3,6), (4, 8)]
>>> t = IIBTree(data)
>>> u = OOBTree(data)  # same data, different key+value types
>>> t.byValue()        # note that byValue() w/o an arg isn't allowed
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: function takes exactly 1 argument (0 given)

>>> t.byValue(-2)  # works close to the way Casey said (with "sorted"
                   # meaning in decreasing order), except that it throws
                   # some (key, value) pairs away
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0), (-2, -1)]
>>> u.byValue(-2)  # ditto
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0), (-2, -1)]

>>> t.byValue(0)   # likewise, but throwing more pairs away
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0)]
>>> u.byValue(0)   # ditto
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0)]

>>> t.byValue(2)   # oops!  it can also change the values
[(4, 4), (3, 3), (2, 2), (1, 1)]
>>> u.byValue(2)   # oops!  or maybe not, depending on type
[(8, 4), (6, 3), (4, 2), (2, 1)]

>>> t.byValue(5)   # another variant of the last example
[(1, 4), (1, 3)]
>>> u.byValue(5)   # ditto
[(8, 4), (6, 3)]
>>>

So this is really some combination of sorting, filtering, and type-dependent
value arithmetic, all rolled into one.  byValue() isn't called in the Zope3
codebase so far, and I see one use in the Zope2 codebase (in Catalog.py).
I've never seen it called with an argument other than 0 in real life (in
which specific case, and if no value is less than 0, it's easy to explain
what it does).

> ZCatalog uses this to sort "scored" results, such as from text indexes,
> which start as a mapping of rid->score.

I'd like a method that did only that much a lot better.  Note that in
ZCTextIndex we didn't sort the whole thing, instead we used an N-best
priority queue to remember just the best N scoring items.  Even running at
Python speed, and using a dirt-dumb list for the queue, this was usually
much faster than sorting the whole result sequence (at C speed) first (&
that's generally true if N is much less than the # of items in the whole
result sequence).

> I have actually been camping on some optimized code for this. The
> current implementation is pretty lame. I came up with a new API,
> keysByValue(), which  returns the keys in order by value, which is
> really all ZCatalog needs. The implementation I have for *IBTree
> variants is 10x faster than the existing byValue implementation.
>
> I should probably get with you and/or Jim and discuss
> generalizing this and integrating it into the BTrees module.
> Basically I need to make it work for *OBTree variants and tangle
> it up with the macros in there.

Yup, more macros is exactly what BTrees need <wink>.  If you lose your one
use for the existing byValue() method then, I'd like to deprecate it for
real, as I don't know of any other uses, it's not even tested, and is hard
to explain.