[Zope] Weighing catalog searches per index ?

Jean-Francois.Doyon at CCRS.NRCan.gc.ca Jean-Francois.Doyon at CCRS.NRCan.gc.ca
Thu Jan 8 16:24:58 EST 2004


Casey,

Thanks for pointing out this product, I'll have to give it a try, as I can
foresee many useful applications for it !

I'm working on the next generation of our site.  Currently we use a regular
TextIndex, which is obviously oversimplistic and insufficient.

So right now I've been using ZCTextIndex through development, and it seems
to give decent results (Hard to tell without getting some mass usage).

Problem is some co-workers using different technologies have "weighing" and
it sounds like something interesting, at least form the user perspective.
Notably, we'd like to maybe experiment with giving the Title more priority
over the rest, so that when someone views the search results with the
titles, it's perceived as being relevant results.  Also, if we have
weighing, content could possibly be tweaked/adjusted to take that into
account (Notably with Keywords).

Your product seems to have a good base to start with.  The problem now, and
one that stopped me in my tracks, is how to define/calculate/configure this
"weighing" concept.  You suggest there's some underlying functionality for
weighing already, maybe it'd just be a matter of taking advantage of it, and
documenting how to use it ? The big question would be what does a weight of
"1" MEAN versus a weight of "2" or "5" ?

The other is how it gets purely implemented.  Does the weight need to be
known at indexing time, or can it be provided at search time ? My hunch is
the weighing should be applied at search time, so your product could be
modified to take as input the weights to apply to each index that is being
search through ?

Something like:

result = catalog(dc_fields={"query":"Some search string", "fields":["Title",
"Description"]})

could become:

result = catalog(dc_fields={"query":"Some search string", "fields":["Title",
"Description"], "weights":[5,1]})

Meaning apply a weight of 5 to Title, and 1 to Description.  Which I would
in turn interpret as meaning Title is 5 times more important than
Description (Not knowing any better right now).

Personally I'm using the Okapi algorithm.  When I started investigating
this, I came to the (admitedly uneducated) conclusion that to do proper,
fast weighing, then the Okapi implementation would have to be modified to
support this feature (Maybe it does already ??), which is over my head,
especially with the okascore module being Python/C.  Doing it in python
would mean doing a second pass over the results that have already been
scored once, which is innefficient it seems, and computationally intensive
(Especially as I envision th efact that really really nice weighing
algorythms would need to have all content in memory in order to do
relational work between records).

Anyways, that's what I've been thinking about ... But the benefits of having
such a beast seem really tentalizing, so I thought I'd ask anyways ...
Besides maybe I'm way out to left field on this and it's easier than I make
it out to be ?! :)

Thoughts ?

Thanks,
J.F.

-----Original Message-----
From: Casey Duncan [mailto:casey at zope.com]
Sent: Thursday, January 08, 2004 2:54 PM
To: Jean-Francois.Doyon at CCRS.NRCan.gc.ca
Cc: zope at zope.org
Subject: Re: [Zope] Weighing catalog searches per index ?


On Thu, 8 Jan 2004 13:43:43 -0500 
Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:

> Hello,
> 
> Does anybody know of a decent implementation of a scoring algorithm
> that does "weighing" of results, presumably based on the indexes used
> ?

Low-level support for this already exists via the weightedIntersection
and weightedUnion set operations.

ZCatalog currently gives all indexes a weight of 1 however.
 
> I'd like to explore the possibility of searching the catalog, but
> giving results from certain indexes priority over others.

It is possible to implement an index whose results are scored. This is
used by TextIndexes to implement relevance ranking for instance. The
index just needs to return a mapping (usually a BTree) of rid->score
where rid is the record id of the catalog record. ZCatalog automatically
adds these scores when intersecting results across indexes.
 
> So in the case of the CMF, saying that if search terms are found in
> the Title or Descrption, they are more "important" than if they're
> found somewhere else and so on ...

This might be an iteresting addition to my FieldedTextIndex product.
Currently all indexed fields are weighted the same, but it would be
straightforward to make this configurable per field.
 
> I know this is a common concept in more advanced search engines (Such
> as Oracle's InterMedia), but I'm wondering if anyone has done
> something like this in Zope ...

Let me know what your specific use case is and maybe I'll add it to the
FieldedTextIndex product if it fits its usage.

-Casey



More information about the Zope mailing list