[ZODB-Dev] Updated BTree docs
Tim Peters
tim at zope.com
Fri May 2 11:54:39 EDT 2003
Casey Duncan]
> I think I can elaborate on the following passage in the docs:
>
> "... and byValue(), which should probably be ignored (it's hard to
> explain exactly what it does, and as a result it's almost never
> used - best to consider it deprecated). "
>
> byValue() returns (value, key) pairs in sorted order by value.
If that were true, I wouldn't have a problem explaining what it does <wink>.
Examples to ponder:
>>> from BTrees.IIBTree import IIBTree
>>> from BTrees.OOBTree import OOBTree
>>> data = [(i, 2*i) for i in range(-5, 5)]
>>> data
[(-5, -10), (-4, -8), (-3, -6), (-2, -4), (-1, -2), (0, 0),
(1, 2), (2, 4), (3,6), (4, 8)]
>>> t = IIBTree(data)
>>> u = OOBTree(data) # same data, different key+value types
>>> t.byValue() # note that byValue() w/o an arg isn't allowed
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: function takes exactly 1 argument (0 given)
>>> t.byValue(-2) # works close to the way Casey said (with "sorted"
# meaning in decreasing order), except that it throws
# some (key, value) pairs away
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0), (-2, -1)]
>>> u.byValue(-2) # ditto
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0), (-2, -1)]
>>> t.byValue(0) # likewise, but throwing more pairs away
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0)]
>>> u.byValue(0) # ditto
[(8, 4), (6, 3), (4, 2), (2, 1), (0, 0)]
>>> t.byValue(2) # oops! it can also change the values
[(4, 4), (3, 3), (2, 2), (1, 1)]
>>> u.byValue(2) # oops! or maybe not, depending on type
[(8, 4), (6, 3), (4, 2), (2, 1)]
>>> t.byValue(5) # another variant of the last example
[(1, 4), (1, 3)]
>>> u.byValue(5) # ditto
[(8, 4), (6, 3)]
>>>
So this is really some combination of sorting, filtering, and type-dependent
value arithmetic, all rolled into one. byValue() isn't called in the Zope3
codebase so far, and I see one use in the Zope2 codebase (in Catalog.py).
I've never seen it called with an argument other than 0 in real life (in
which specific case, and if no value is less than 0, it's easy to explain
what it does).
> ZCatalog uses this to sort "scored" results, such as from text indexes,
> which start as a mapping of rid->score.
I'd like a method that did only that much a lot better. Note that in
ZCTextIndex we didn't sort the whole thing, instead we used an N-best
priority queue to remember just the best N scoring items. Even running at
Python speed, and using a dirt-dumb list for the queue, this was usually
much faster than sorting the whole result sequence (at C speed) first (&
that's generally true if N is much less than the # of items in the whole
result sequence).
> I have actually been camping on some optimized code for this. The
> current implementation is pretty lame. I came up with a new API,
> keysByValue(), which returns the keys in order by value, which is
> really all ZCatalog needs. The implementation I have for *IBTree
> variants is 10x faster than the existing byValue implementation.
>
> I should probably get with you and/or Jim and discuss
> generalizing this and integrating it into the BTrees module.
> Basically I need to make it work for *OBTree variants and tangle
> it up with the macros in there.
Yup, more macros is exactly what BTrees need <wink>. If you lose your one
use for the existing byValue() method then, I'd like to deprecate it for
real, as I don't know of any other uses, it's not even tested, and is hard
to explain.
More information about the ZODB-Dev
mailing list