[ZODB-Dev] n-way join algoritms
Phillip J. Eby
pje at telecommunity.com
Sat Jun 21 19:11:24 EDT 2003
At 09:18 PM 6/21/03 +0200, Roché Compaan wrote:
>Does anybody now of an example n-way join algorithm in python or a
>resource that may get me going. I've googled for them but mainly ran
>into very technical papers with lots a relational algebra which I am not
>familiar with.
Did you look at Aaron Watters' paper on Gadfly 2?
>I am building a query service for a Zope app that uses the catalog for
>simple queries on properties and an associations repository for
>associations between objects.
>
>In the absence of a good algorithm I made one that goes like this:
>
>Say I query on metatypes a,b,c and d. The result should be joined with
>the condition: a.x == b.x and b.y == c.y and c.z == d.z. What I do at
>the moment is to process conditions from left to right and then from
>right to left and I seem to be getting the right result:
>
>left to right:
> a.x == b.x: filter a and b
> b.y == c.y: filter b and c
> c.z == d.z: filter c and d
>right to left:
> b.y == c.y: filter b and c
> a.x == b.x: filter a and b
I don't understand what you're saying. But, the basic merge algorithm for
joining is trivial. All of the complexity of joining comes from optimization.
The most trivial algorithm would look like this:
for a in A:
for b in B:
if a.x == b.x:
for c in C:
if b.y==c.y:
for d in D:
if c.z==d.z:
yield a,b,c,d
To speed this up a bit, wherever there is an index available, you can
replace the 'for' loop with something that loops only through the items
with a match from the preceding outer loop.
Further optimization is then much trickier. If you have additional search
arguments ("sargs"), you can try to predict which of A, B, C, and D would
have the fewest matching items, and then you would use that set for the
outermost loop. (Notice that from a correctness POV, it doesn't matter
what order the loops are nested, although the loops don't always look as
simple when you change the order.) Likewise, if some items have indexes
and others don't, you may want to move the indexed loops inward, since they
will go faster than unindexed searches.
Past this point, things then start to get really interesting, and you
should probably just use a database that already has this stuff built in.
More information about the ZODB-Dev
mailing list