[Zope-dev] adding uniqueValuesFor to CatalogQuery
Steve Alexander
steve@cat-box.net
Sat, 01 Sep 2001 20:45:17 +0100
This is a multi-part message in MIME format.
--------------000004000806080806040102
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Hi folks,
I've been playing around with adding a uniqueValuesFor capability to
Casey Duncan's CatalogQuery product.
This is more a proof-of-concept, so it isn't straightforward to use just
yet. There are probably bugs that I've overlooked, especially due to
assumptions about the types of things that BTrees and Buckets return.
Patches and explanations gratefully accepted!
Interestingly, I came up with two algorithms for uniqueValuesFor. One of
them uses the fact that the difference method of IMerge returns items,
not just keys.
The other algorithm is less tricky and more laborious.
I decided to test the algorithms using a Catalog I have lying around. It
isn't all that full, maybe a few hundred records, and a few tens of
unique values for the index I'm interested in.
I found that the first algorithm runs about ten times as fast as the second.
I also tried it on a larger Catalog (one with 2000 records), and the
first algorithm ran 30 times as fast as the second.
Attached are the diffs and the test script I used. The test script is
specific to a particular application I have, but it should be easy to
run it against another catalog.
Oh, by the way, Casey's CatalogQuery is a MonkeyPatch (those run-time
patches formerly known as hotfixes) on the ZCatalog product. I did, for
just a second, consider releasing this as a MonkeyPatch product that
patches CatalogQuery. So, it would be a MonkeyPatch of a MonkeyPatch, or
perhaps a MonkeyMonkeyPatch.
--
Steve Alexander
Software Engineer
Cat-Box limited
--------------000004000806080806040102
Content-Type: text/plain;
name="test.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="test.py"
import Zope, ZPublisher
#c=Zope.app().school.Instructors.Catalog
c=Zope.app().school.Pupils.Catalog
from Products.CatalogQuery import CatalogQuery
cq=CatalogQuery(c._catalog, 'current==1')
field="area_upper"
a1=cq(uniqueValuesFor=field)
a2=cq(uv2=field)
a3=cq(uv3=field)
from time import time
m1=time()
a1=cq(uniqueValuesFor=field)
m2=time()
a2=cq(uv2=field)
m3=time()
a3=cq(uv3=field)
m4=time()
print "a1 took", m2-m1
print "a2 took", m3-m2
print "a3 took", m4-m3
# ---- Catalog with a few hundred records
# a1 took 0.00448000431061
# a2 took 0.0652350187302
# a3 took 0.0657860040665
# ---- Catalog with 2000 records
# a1 took 0.0895169973373
# a2 took 2.71825802326
# a3 took 2.75240898132
--------------000004000806080806040102
Content-Type: text/plain;
name="diffs.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="diffs.txt"
*** original CatalogQuery.py
--- new CatalogQuery.py
***************
*** 43,48 ****
--- 43,51 ----
from Acquisition import Implicit
from Products.ZCatalog.Lazy import LazyMap
from BTrees.IIBTree import intersection, weightedIntersection, union, weightedUnion, difference, IISet
+ from BTrees import IOBTree
+ from BTrees.OIBTree import OISet
+ from types import IntType
CQL_re = re.compile(
('\s*([_\-a-zA-Z0-9]+)\s*'
***************
*** 167,173 ****
if hasattr(self, '_v_all_keys'):
del self._v_all_keys
! def __call__(self, md={}, used=None, **kw):
"""Execute the query on the catalog"""
# Combine any keywords into the namespace mapping
--- 170,176 ----
if hasattr(self, '_v_all_keys'):
del self._v_all_keys
! def __call__(self, md={}, used=None, uniqueValuesFor=None, uv2=None, uv3=None, **kw):
"""Execute the query on the catalog"""
# Combine any keywords into the namespace mapping
***************
*** 186,191 ****
--- 189,237 ----
w, rs = weightedIntersection(rs, r)
elif block.logic == 'or':
w, rs = weightedUnion(rs, r)
+
+ if uniqueValuesFor:
+ inverse_rs=IOBTree.IOSet(difference(self._getAllKeys(), rs))
+ index=self._catalog.indexes[uniqueValuesFor]
+ results=IOBTree.difference(index._unindex, inverse_rs)
+ resultset=OISet()
+ for l in results.values():
+ resultset.update(l)
+ self._cleanUp()
+ return resultset.keys()
+ if uv2:
+ index=self._catalog.indexes[uv2]
+ doc_ids=intersection(rs, IISet(index._unindex.keys()))
+ d={} # could use an OISet here for automatic sorted order
+ for k,v in index._index.items():
+ if isinstance(v, IntType):
+ if v in doc_ids:
+ d[k]=None
+ else:
+ for i in v.keys():
+ if i in doc_ids:
+ d[k]=None
+ continue
+ resultset=d.keys()
+ resultset.sort()
+ self._cleanUp()
+ return resultset
+ if uv3:
+ index=self._catalog.indexes[uv3]
+ doc_ids=intersection(rs, IISet(index._unindex.keys()))
+ resultset=OISet()
+ for k,v in index._index.items():
+ if isinstance(v, IntType):
+ if v in doc_ids:
+ resultset.insert(k)
+ else:
+ for i in v.keys():
+ if i in doc_ids:
+ resultset.insert(k)
+ continue
+ self._cleanUp()
+ return resultset.keys()
+
# Create a lazy result map of the results
if hasattr(rs, 'keys'): rs=rs.keys()
--------------000004000806080806040102--