[Zope] Weighing catalog searches per index ?

Casey Duncan casey at zope.com
Fri Jan 9 23:02:30 EST 2004


On Fri, 9 Jan 2004 18:00:11 -0500 
Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:

> Casey,
> 
> Ahhh, so it just multiplies the score.  Which also means the scoring
> is applied to each field, instead of merging the fields and THEN
> scoring.  But doesn't that mean that even if not restricting the
> search to specific fields, the scrow coming out of one of our indexes
> could be different than a pure ZCTextIndex which scores on just one
> big "blob" of textual content, instead of several small ones ... At
> least with Okapi that would presumably make a difference sincepart of
> the cosring is based on the totoal number of words in the document ?

FieldedTextIndex actually still stores the word=>doc=>score mapping the
same way ZCTextIndex does. It keeps a separate word=>field=>doc mapping
(unscored). When you do a search without selecting any fields it only
uses the first mapping, so the scores work out the same. In fact the
amount of work is exactly the same. 

When you do specify fields it intersects the scored results from all
documents with a union of documents found for each selected field. This
intersection does not affect the scores presently however. This will be
the place I add in the weighting per field (currently they all have a
weight of 1).
 
> As for the syntax of the querying, I'm really indifferent, so long as
> it works :) I guess your suggestion does have advantages over mine
> indeed though !

Yeah, its just easier to see which scores go with which field.
 
> Thanks for getting this done ! Let me know as soon as you've got it
> and I'll gladly try it out.

Sure. I'm glad to have victims^H^H^H^H^H^H^Husers to try it out on ;^)
 
> Since this can be made into a transparent extension of ZCTextIndex,
> I'd really suggest that if/when this is deemed mature enough, it
> replace the current ZCTextIndex.  This searching fucntionalty is kind
> of invaluable and extremely powerful, and I'm sure would be of great
> use to many once they find out about it !

If it is generally deemed useful and enough people use it, I would
definitely propose putting it in the Zope core. For now I'm happy to
shake out the details as a separately distributed product.

I don't think it will be able to fully replace ZCTextIndex though,
mainly because the input data structure is different (a dict vs. a
string or list of strings). Most applications (like CMF) define
SearchableText to return a string. That is the way all TextIndexes have
worked up til now. OTOH it would not be out of the question to make
FieldedTextIndex understand a simple string input (and store as a single
field named "SearchableText" or "body").

Another, perhaps less compelling argument to not replacing ZCTextIndex
wholesale is that FieldedTextIndex is a more expensive data structure
when you only need a single text blob indexed. That quickly changes
though when you start replacing a bunch of ZCTextIndexes with a single
FieldedTextIndex though.

-Casey



More information about the Zope mailing list