From: "+lupa+" <lupa@zurven.com>
Hello Jonathan, I'm on the digest and not the regular zope list, so I'm sending you this off list (feel free to post it if you find it useful). My CalendarX product for Plone has some examples in it of making separate queries to the catalog and combining them and unique-ing them.
In short, you can add two catalog query results together with a + sign. Or if you have one query (q1) and want to add another (q2) to it, you can use:
q1 += q2
That's all it takes.
If there is overlap between the result sets, you'll have duplicates, so you should run them through a "unique" routine -- there was a good one in the CMFCalendar.CalendarTool.py routines, which I extracted and put into a Python script in CalendarX.
Here's the code, in its entirety: # Unique the results of a query (query) results = [] rids = [] for item in query: rid = item.getRID() if not rid in rids: results.append(item) rids.append(rid)
Finally, related to an intersection, is a little routine for subtracting one query from another. In CalendarX, this is my queriesSubtract.py routine for subtracting any events in q2 from those in q1 (so this is q1 not in q2):
#make a list of RIDs in q2, using a list comprehension: q2rids = [item.getRID() for item in q2] #make a new q1 that contains only items with RIDs NOT in q2rids q1new = [item for item in q1 if not item.getRID() in q2rids] return q1new
Both these routines use the RID (record ID) of objects in the ZCatalog, which is fairly key to this type of job. I think with these basic examples you should be able to implement just about anything you need. But I also agree with Andreas that Dieter's AdvancedQuery product is terrifically useful for creating truly sensible queries on the ZCatalog.
Thanks for the response! (all ideas, comments and suggestions greatfully received!) In my particular case I am trying to combine the result sets from two different search on two different zcatalogs (ie. not two different searches of the same zcatalog). In this case the RIDs do not match (each zcatalog maintains its own RIDs). I am doing this to try to squeeze out some performance improvements from a ZCTextIndex. We have a zcatalog with about 1 million documents that we are full-text indexing and it no longer fits into memory (therefore requiring many disk i/o's during retrieval which is seriously degrading performance). Our zcatalog currently has 5 indexes: 4 minor indexes and one major index (the main ZCTextIndex). I am attempting to split the zcatalog into two separate zcatalogs: one containing the 4 minor indexes and one containing the ZCTextIndex. The hope is that the zcatalog containing only the ZCTextIndex will be smaller and will again fit into memory. The only difficulty is in combining the results from searches of two separate zcatalogs in an efficient manner. My best guess at this point is that I will have to patch the 'search' routine in ZCTextIndex to stop it from 'Lazifying' the result sets, so that I can join/intersect the result sets based on OIDs (instead of RIDs - which should be doable as the result sets prior to 'lazifying' are xxBTrees and the BTrees product comes with methods for join/intersection). I can then 'Lazify' the final result set and return it. At least that's the theory! Thanks again for you reponse, Jonathan