Re: [Zope] Intersection/Union of ZCatalog result sets

24 Sep 2004

      From: "+lupa+" <lupa@zurven.com>
...
Hello Jonathan,
    I'm on the digest and not the regular zope list, so I'm sending you
this off list (feel free to post it if you find it useful).  My CalendarX
product for Plone has some examples in it of making separate queries to
the
catalog and combining them and unique-ing them.
In short, you can add two catalog query results together with a + sign.
Or
if you have one query (q1) and want to add another (q2) to it, you can
use:
q1 += q2
That's all it takes.
If there is overlap between the result sets, you'll have duplicates, so
you
should run them through a "unique" routine -- there was a good one in the
CMFCalendar.CalendarTool.py routines, which I extracted and put into a
Python script in CalendarX.
Here's the code, in its entirety:
# Unique the results of a query (query)
results = []
rids = []
for item in query:
     rid = item.getRID()
     if not rid in rids:
         results.append(item)
         rids.append(rid)
Finally, related to an intersection, is a little routine for subtracting
one query from another.  In CalendarX, this is my queriesSubtract.py
routine for subtracting any  events in q2 from those in q1 (so this is q1
not in q2):
#make a list of RIDs in q2, using a list comprehension:
q2rids = [item.getRID() for item in q2]
#make a new q1 that contains only items with RIDs NOT in q2rids
q1new = [item for item in q1 if not item.getRID() in q2rids]
return q1new
Both these routines use the RID (record ID) of objects in the ZCatalog,
which is fairly key to this type of job.  I think with these basic
examples
you should be able to implement just about anything you need.  But I also
agree with Andreas that Dieter's AdvancedQuery product is terrifically
useful for creating truly sensible queries on the ZCatalog.
Thanks for the response! (all ideas, comments and suggestions greatfully
received!)

In my particular case I am trying to combine the result sets from two
different search on two different zcatalogs (ie. not two different searches
of the same zcatalog).  In this case the RIDs do not match (each zcatalog
maintains its own RIDs).

I am doing this to try to squeeze out some performance improvements from a
ZCTextIndex. We have a zcatalog with about 1 million documents that we are
full-text indexing and it no longer fits into memory (therefore requiring
many disk i/o's during retrieval which is seriously degrading performance).

Our zcatalog currently has 5 indexes: 4 minor indexes and one major index
(the main ZCTextIndex).  I am attempting to split the zcatalog into two
separate zcatalogs: one containing the 4 minor indexes and one containing
the ZCTextIndex.  The hope is that the zcatalog containing only the
ZCTextIndex will be smaller and will again fit into memory.

The only difficulty is in combining the results from searches of two
separate zcatalogs in an efficient manner.  My best guess at this point is
that I will have to patch the 'search' routine in ZCTextIndex to stop it
from 'Lazifying' the result sets, so that I can join/intersect the result
sets based on OIDs (instead of RIDs - which should be doable as the result
sets prior to 'lazifying' are xxBTrees and the BTrees product comes with
methods for join/intersection). I can then 'Lazify' the final result set and
return it.  At least that's the theory!

Thanks again for you reponse,

Jonathan

Re: [Zope] Intersection/Union of ZCatalog result sets

Jonathan Hobbs