[Zope] Combining hits from searching 2 ZCatalogs at once

Casey Duncan casey@zope.com
Wed, 30 Apr 2003 22:58:40 -0400


On Wednesday 30 April 2003 08:38 pm, Gordon Lai wrote:
> Hi,
>=20
> I'm writing a Python script that searches 2 ZCatalogs
> at once with a form-provided query. The first ZCat
> contains scanned images of text documents plus their
> metadata. The second ZCat contains the OCR text of the
> same documents. The searches are written like this:
>=20
>   textresults =3D context.Text_Catalog(
> {'PrincipiaSearchSource': query} )
>   ocrresults =3D context.OCR_Catalog(
> {'PrincipiaSearchSource': query} )
>=20
> Text_Catalog is the primary catalog; it contains more
> metadata than in OCR_Catalog and hits on this catalog
> are preferred. OCR_Catalog is the secondary catalog;
> if I get a hit here (and assuming there's no hit on
> its matching document in Text_Catalog), then I want to
> find its matching document and add its metadata to
> textresults as a new hit. I'll then return the
> modified textresults to the calling form.
>=20
> My question is: How do I add the new hit to
> textresults? I tried textresults.append( newhit ), but
> I found out that textresults isn't a sequence, it's a
> LazyCat class instance. How do I append new items to
> this instance?

You don't, but you can get the Catalog to return you "raw" sets which can=
 be=20
manipulated or added together with other catalogs. To do this, you must u=
se=20
an external method to call methods of the underlying Catalog object direc=
tly.=20
Catalog.py has a function mergeResults which can be used to turn the raw =
sets=20
into lazy catalog results like usual.

Here is a simple example (external method):

from Products.ZCatalog.Catalog import mergeResults

def queryMultipleCatalogs(request, *zcatalogs):
    results =3D []
    for zcat in zcatalogs:
        results.append(zcat1._catalog.searchResults(request, _merge=3D0))
    sorted =3D request.has_key('sort-on') or request.has_key('sort_on')
    reverse =3D ((request.get('sort-order','') or=20
               request.get('sort_order','')).lower()=20
               in ('reverse','descending'))
    return mergeResults(results, sorted, reverse)

The key is passing _merge=3D0 to searchResults. It then returns a raw res=
ult=20
set. In the case of a sorted set this should be a standard Python list=20
containing three tuples of (sort_key, docid/rid, catalog.__getitem__). Th=
ese=20
could be manipulated however you like. mergeResults can turn them back in=
to=20
standard catalog results.

To learn more your best bet is to read the Catalog.py sources and use som=
e of=20
the methods in there as I have above.

hth,

-Casey