[ZODB-Dev] Obtaining a subset of BTree items

Tim Peters tim at zope.com
Thu Oct 16 11:12:53 EDT 2003


[Shane Hathaway]
> Here's a trick I used.  For some uses, it's much faster than a list
> comprehension (or any sort of Python iteration.)
>
> >>> from BTrees.IOBTree import IOBTree, IOSet, difference
> >>> set = IOSet([1,2])
> >>> tree = IOBTree({1:'a', 2:'b', 3:'c', 4:'d'})
> >>> complement = difference(tree, set)
> >>> d = difference(tree, complement)
> >>> list(d.items())
> [(1, 'a'), (2, 'b')]
>
> The trick is that difference() returns a bucket rather than a set.
> union() and intersection(), by contrast, return sets.

Eww -- that is wickedly clever!  LOL.

> Although the computation is more complex, it's all done in C.  In my
> tests, this method came out faster for large sets (50,000 items or so)
> than even the most optimized Python iteration.

The problem isn't so much Python as what the Python iteration does at the C
level:  a long (> 50,000, I guess) sequence of key lookups.  Intersection,
difference and union don't do any lookups at all:  they march over the
BTree's leaf buckets only, which are linked together directly.  There's no
way to get at that directly from Python.  So the Python loop pays for lots
of lookups that your clever method avoids entirely.

> I figure this is sufficient until BTrees grow some sort of map
> operation.

There's one other trick that applies only if the tree has integer values (so
doesn't apply to IO or OO flavors).  Then weightedIntersection can be
tricked into doing it in one step.  For example,

"""
from BTrees.IIBTree import IIBTree, IISet, difference
from BTrees.IIBTree import weightedIntersection

set = IISet([1, 2])
tree = IIBTree({1: 9, 2: 10, 3: 11, 4: 12})
complement = difference(tree, set)
d = difference(tree, complement)
print list(d.items())

junk, d = weightedIntersection(tree, set, 1, 0)
print list(d.items())
"""

That prints [(1, 9), (2, 10)] twice.

The biggest problem I have with the weightedXYZ functions is that I always
forget they return a two-tuple, and I've never yet had any use for the first
thing in the tuple.




More information about the ZODB-Dev mailing list