Hi folks, I've been playing around with adding a uniqueValuesFor capability to Casey Duncan's CatalogQuery product. This is more a proof-of-concept, so it isn't straightforward to use just yet. There are probably bugs that I've overlooked, especially due to assumptions about the types of things that BTrees and Buckets return. Patches and explanations gratefully accepted! Interestingly, I came up with two algorithms for uniqueValuesFor. One of them uses the fact that the difference method of IMerge returns items, not just keys. The other algorithm is less tricky and more laborious. I decided to test the algorithms using a Catalog I have lying around. It isn't all that full, maybe a few hundred records, and a few tens of unique values for the index I'm interested in. I found that the first algorithm runs about ten times as fast as the second. I also tried it on a larger Catalog (one with 2000 records), and the first algorithm ran 30 times as fast as the second. Attached are the diffs and the test script I used. The test script is specific to a particular application I have, but it should be easy to run it against another catalog. Oh, by the way, Casey's CatalogQuery is a MonkeyPatch (those run-time patches formerly known as hotfixes) on the ZCatalog product. I did, for just a second, consider releasing this as a MonkeyPatch product that patches CatalogQuery. So, it would be a MonkeyPatch of a MonkeyPatch, or perhaps a MonkeyMonkeyPatch. -- Steve Alexander Software Engineer Cat-Box limited import Zope, ZPublisher #c=Zope.app().school.Instructors.Catalog c=Zope.app().school.Pupils.Catalog from Products.CatalogQuery import CatalogQuery cq=CatalogQuery(c._catalog, 'current==1') field="area_upper" a1=cq(uniqueValuesFor=field) a2=cq(uv2=field) a3=cq(uv3=field) from time import time m1=time() a1=cq(uniqueValuesFor=field) m2=time() a2=cq(uv2=field) m3=time() a3=cq(uv3=field) m4=time() print "a1 took", m2-m1 print "a2 took", m3-m2 print "a3 took", m4-m3 # ---- Catalog with a few hundred records # a1 took 0.00448000431061 # a2 took 0.0652350187302 # a3 took 0.0657860040665 # ---- Catalog with 2000 records # a1 took 0.0895169973373 # a2 took 2.71825802326 # a3 took 2.75240898132 *** original CatalogQuery.py --- new CatalogQuery.py *************** *** 43,48 **** --- 43,51 ---- from Acquisition import Implicit from Products.ZCatalog.Lazy import LazyMap from BTrees.IIBTree import intersection, weightedIntersection, union, weightedUnion, difference, IISet + from BTrees import IOBTree + from BTrees.OIBTree import OISet + from types import IntType CQL_re = re.compile( ('\s*([_\-a-zA-Z0-9]+)\s*' *************** *** 167,173 **** if hasattr(self, '_v_all_keys'): del self._v_all_keys ! def __call__(self, md={}, used=None, **kw): """Execute the query on the catalog""" # Combine any keywords into the namespace mapping --- 170,176 ---- if hasattr(self, '_v_all_keys'): del self._v_all_keys ! def __call__(self, md={}, used=None, uniqueValuesFor=None, uv2=None, uv3=None, **kw): """Execute the query on the catalog""" # Combine any keywords into the namespace mapping *************** *** 186,191 **** --- 189,237 ---- w, rs = weightedIntersection(rs, r) elif block.logic == 'or': w, rs = weightedUnion(rs, r) + + if uniqueValuesFor: + inverse_rs=IOBTree.IOSet(difference(self._getAllKeys(), rs)) + index=self._catalog.indexes[uniqueValuesFor] + results=IOBTree.difference(index._unindex, inverse_rs) + resultset=OISet() + for l in results.values(): + resultset.update(l) + self._cleanUp() + return resultset.keys() + if uv2: + index=self._catalog.indexes[uv2] + doc_ids=intersection(rs, IISet(index._unindex.keys())) + d={} # could use an OISet here for automatic sorted order + for k,v in index._index.items(): + if isinstance(v, IntType): + if v in doc_ids: + d[k]=None + else: + for i in v.keys(): + if i in doc_ids: + d[k]=None + continue + resultset=d.keys() + resultset.sort() + self._cleanUp() + return resultset + if uv3: + index=self._catalog.indexes[uv3] + doc_ids=intersection(rs, IISet(index._unindex.keys())) + resultset=OISet() + for k,v in index._index.items(): + if isinstance(v, IntType): + if v in doc_ids: + resultset.insert(k) + else: + for i in v.keys(): + if i in doc_ids: + resultset.insert(k) + continue + self._cleanUp() + return resultset.keys() + # Create a lazy result map of the results if hasattr(rs, 'keys'): rs=rs.keys()