[Zope-CVS] CVS: Products/ZCTextIndex - QueryEngine.py:1.1.2.4
Tim Peters
tim.one@comcast.net
Sun, 5 May 2002 22:04:52 -0400
Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv6102
Modified Files:
Tag: TextIndexDS9-branch
QueryEngine.py
Log Message:
When doing multiple ORs, do a balanced merge. For example, if we had
8 result sets with 1 element each, and they were pairwise disjoint, we
were doing:
merge 1 with 1
merge 1 with 2
merge 1 with 3
merge 1 with 4
merge 1 with 5
merge 1 with 6
merge 1 with 7
Now we'll do:
merge 1 with 1 (4 times)
merge 2 with 2 (2 times)
merge 4 with 4 (once)
=== Products/ZCTextIndex/QueryEngine.py 1.1.2.3 => 1.1.2.4 ===
from BTrees.IIBTree import difference, weightedIntersection, weightedUnion
+from Products.ZCTextIndex.NBest import NBest
class QueryError(Exception):
pass
@@ -46,16 +47,25 @@
dummy, notset = weightedUnion(notset, x)
set = difference(set, notset)
return set
+
if kind == "OR":
- L = []
- for subtree in tree.getValue():
- L.append(self.executeQuery(index, subtree))
- L.sort(lambda x, y: cmp(len(x), len(y)))
- set = L[0]
- for x in L[1:]:
- dummy, set = weightedUnion(set, x)
- return set
+ # Balance unions as closely as possible, smallest to largest.
+ allofem = tree.getValue()
+ merge = NBest(len(allofem))
+ for subtree in allofem:
+ result = self.executeQuery(index, subtree)
+ merge.add(result, len(result))
+ while len(merge) > 1:
+ # Merge the two smallest so far, and add back to the queue.
+ x, dummy = merge.pop_smallest()
+ y, dummy = merge.pop_smallest()
+ dummy, z = weightedUnion(x, y)
+ merge.add(z, len(z))
+ result, dummy = merge.pop_smallest()
+ return result
+
if kind == "ATOM":
return index.search(tree.getValue())
+
if kind == "NOT":
raise QueryError, "NOT operator must occur right after AND"