[Zope-CVS] CVS: Products/ZCTextIndex - SetOps.py:1.1
Tim Peters
tim.one@comcast.net
Tue, 14 May 2002 20:05:44 -0400
Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv18209
Added Files:
SetOps.py
Log Message:
Squash bug duplication by moving the clever mass-union and mass-
intersection gimmicks into their own functions with their own test suite.
This turned up two bugs:
1. The mass weighted union gimmick was incorrect when passed a list with
a single mapping. In that case, it neglected to multiply the mapping
by the given weight.
2. The underlying weighted{Intersection, Union} code does something crazy
if you pass it weights less than 0. I had vaguely hoped to be able
to subtract scores by passing 1 and -1 as weights, but this doesn't
work. It's hard to say exactly what it does then. The line
weightedUnion(IIBTree(), mapping, 1, -2)
seems to return a mapping with the same keys, but *all* of whose
values are -2, regardless of the original mapping's values.
=== Added File Products/ZCTextIndex/SetOps.py ===
##############################################################################
#
# Copyright (c) 2001, 2002 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE
#
##############################################################################
"""SetOps -- Weighted intersections and unions applied to many inputs."""
from BTrees.IIBTree import IIBTree, weightedIntersection, weightedUnion
from Products.ZCTextIndex.NBest import NBest
def mass_weightedIntersection(L):
"A list of (mapping, weight) pairs -> their weightedIntersection IIBTree."
if not L:
return IIBTree()
# Intersect with smallest first.
L = L[:] # don't mutate the caller's L
L.sort(lambda x, y: cmp(len(x[0]), len(y[0])))
x, w = L[0]
dummy, result = weightedUnion(IIBTree(), x, 1, w)
for x, w in L[1:]:
dummy, result = weightedIntersection(result, x, 1, w)
return result
def mass_weightedUnion(L):
"A list of (mapping, weight) pairs -> their weightedUnion IIBTree."
if not L:
return IIBTree()
if len(L) == 1:
# Have to do a union in order to get the input's values
# multiplied by the weight.
x, weight = L[0]
dummy, result = weightedUnion(IIBTree(), x, 1, weight)
return result
# Balance unions as closely as possible, smallest to largest.
assert len(L) > 1
merge = NBest(len(L))
for x, weight in L:
merge.add((x, weight), len(x))
while len(merge) > 1:
# Merge the two smallest so far, and add back to the queue.
(x, wx), dummy = merge.pop_smallest()
(y, wy), dummy = merge.pop_smallest()
dummy, z = weightedUnion(x, y, wx, wy)
merge.add((z, 1), len(z))
(result, weight), dummy = merge.pop_smallest()
return result