[Zope] Combining hits from searching 2 ZCatalogs at once

Mon, 5 May 2003 16:48:47 -0700 (PDT)

Thanks a lot, your suggestions worked. However, I have
another question: To get the metadata of a document
that matches a hit in OCR_Catalog, I'm currently doing
another search on absolute_url in Text_Catalog. This
works fine, but if there are lots of OCR_Catalog hits,
this really slows down the overall search. My strategy
to speed things up is to avoid the extra search, read
the document directly from disk, extract its metadata,
and add it to textresults. However, I'm not sure how
to create the object that's returned from
searchResults() so that I can assign the metadata to
it and then add it to textresults. I've looked through
Catalog.py and found that searchResults() calls
search(), which calls _apply_index() in TextIndex.py,
which calls query(), which calls evaluate(). I then
get lost in here because evaluate() doesn't seem to be
evaluating anything; it reduces operators in the query
but then doesn't seem to use the query to search an
index. How do I create a searchResults() object?

Thanks,
Gordon

--- Casey Duncan <casey@zope.com> wrote:
> On Wednesday 30 April 2003 08:38 pm, Gordon Lai
> wrote:
> > Hi,
> > 
> > I'm writing a Python script that searches 2
> ZCatalogs
> > at once with a form-provided query. The first ZCat
> > contains scanned images of text documents plus
> their
> > metadata. The second ZCat contains the OCR text of
> the
> > same documents. The searches are written like
> this:
> > 
> >   textresults = context.Text_Catalog(
> > {'PrincipiaSearchSource': query} )
> >   ocrresults = context.OCR_Catalog(
> > {'PrincipiaSearchSource': query} )
> > 
> > Text_Catalog is the primary catalog; it contains
> more
> > metadata than in OCR_Catalog and hits on this
> catalog
> > are preferred. OCR_Catalog is the secondary
> catalog;
> > if I get a hit here (and assuming there's no hit
> on
> > its matching document in Text_Catalog), then I
> want to
> > find its matching document and add its metadata to
> > textresults as a new hit. I'll then return the
> > modified textresults to the calling form.
> > 
> > My question is: How do I add the new hit to
> > textresults? I tried textresults.append( newhit ),
> but
> > I found out that textresults isn't a sequence,
> it's a
> > LazyCat class instance. How do I append new items
> to
> > this instance?
> 
> You don't, but you can get the Catalog to return you
> "raw" sets which can be 
> manipulated or added together with other catalogs.
> To do this, you must use 
> an external method to call methods of the underlying
> Catalog object directly. 
> Catalog.py has a function mergeResults which can be
> used to turn the raw sets 
> into lazy catalog results like usual.
> 
> Here is a simple example (external method):
> 
> from Products.ZCatalog.Catalog import mergeResults
> 
> def queryMultipleCatalogs(request, *zcatalogs):
>     results = []
>     for zcat in zcatalogs:
>        
> results.append(zcat1._catalog.searchResults(request,
> _merge=0))
>     sorted = request.has_key('sort-on') or
> request.has_key('sort_on')
>     reverse = ((request.get('sort-order','') or 
>                request.get('sort_order','')).lower()
> 
>                in ('reverse','descending'))
>     return mergeResults(results, sorted, reverse)
> 
> The key is passing _merge=0 to searchResults. It
> then returns a raw result 
> set. In the case of a sorted set this should be a
> standard Python list 
> containing three tuples of (sort_key, docid/rid,
> catalog.__getitem__). These 
> could be manipulated however you like. mergeResults
> can turn them back into 
> standard catalog results.
> 
> To learn more your best bet is to read the
> Catalog.py sources and use some of 
> the methods in there as I have above.
> 
> hth,
> 
> -Casey

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com