On Fri, 9 Jan 2004 18:00:11 -0500 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
Casey,
Ahhh, so it just multiplies the score. Which also means the scoring is applied to each field, instead of merging the fields and THEN scoring. But doesn't that mean that even if not restricting the search to specific fields, the scrow coming out of one of our indexes could be different than a pure ZCTextIndex which scores on just one big "blob" of textual content, instead of several small ones ... At least with Okapi that would presumably make a difference sincepart of the cosring is based on the totoal number of words in the document ?
FieldedTextIndex actually still stores the word=>doc=>score mapping the same way ZCTextIndex does. It keeps a separate word=>field=>doc mapping (unscored). When you do a search without selecting any fields it only uses the first mapping, so the scores work out the same. In fact the amount of work is exactly the same. When you do specify fields it intersects the scored results from all documents with a union of documents found for each selected field. This intersection does not affect the scores presently however. This will be the place I add in the weighting per field (currently they all have a weight of 1).
As for the syntax of the querying, I'm really indifferent, so long as it works :) I guess your suggestion does have advantages over mine indeed though !
Yeah, its just easier to see which scores go with which field.
Thanks for getting this done ! Let me know as soon as you've got it and I'll gladly try it out.
Sure. I'm glad to have victims^H^H^H^H^H^H^Husers to try it out on ;^)
Since this can be made into a transparent extension of ZCTextIndex, I'd really suggest that if/when this is deemed mature enough, it replace the current ZCTextIndex. This searching fucntionalty is kind of invaluable and extremely powerful, and I'm sure would be of great use to many once they find out about it !
If it is generally deemed useful and enough people use it, I would definitely propose putting it in the Zope core. For now I'm happy to shake out the details as a separately distributed product. I don't think it will be able to fully replace ZCTextIndex though, mainly because the input data structure is different (a dict vs. a string or list of strings). Most applications (like CMF) define SearchableText to return a string. That is the way all TextIndexes have worked up til now. OTOH it would not be out of the question to make FieldedTextIndex understand a simple string input (and store as a single field named "SearchableText" or "body"). Another, perhaps less compelling argument to not replacing ZCTextIndex wholesale is that FieldedTextIndex is a more expensive data structure when you only need a single text blob indexed. That quickly changes though when you start replacing a bunch of ZCTextIndexes with a single FieldedTextIndex though. -Casey