[Zope] Re: FieldedTextIndex weighting implemented

15 Jan 2004

      [CC'd the list in case anybody else is interested]

On Mon, 12 Jan 2004 17:44:52 -0500
Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
...
Casey,
I created the Index, and that works fine, using the only ZCLexicon I
have which is the same I use for ZCTextIndexes ...
Now if I try to view the management page for the Index by clicking on
it in the ZMI, I get :
Traceback (innermost last):
Module ZPublisher.Publish, line 89, in publish 
Module ZPublisher.BaseRequest, line 299, in traverse 
Module Products.ZCatalog.ZCatalogIndexes, line 109, in
__bobo_traverse__ AttributeError: _catalog
Hmm, I'm not sure how the index could be causing that. Pretty weird.
...
After navigating the portal_catalog interface a bit (This is in CMF
BTW), the problem seems to go away ...
Once I've reached this point, I can go ahead and index content.
I'd suggest adding to the ZMI interface of the index the name of the
attribute/method being indexed, it's always handy.
Good point. I thought I'd done that. I definitely will.
...
I'll let you know how the actual searching goes !
Some thought's I'm having: It'd be nice if I could store default
weights into the index, that it would use transparently, unless
specific weights are provided with the query itself.  Also it provides
an easy to use interface to set weights, instead of having to change
code (Possibly often, when one tweaks this stuff).  AND, because (In
my case at least) 99% of searches are on ALL fields anyways, it
prevents me from having to supply a list of fields and weights each
and everytime.
Yes that makes sense. I'll need to implement weighting when searching
all fields which should be straightforward and useful.
...
The current setup is good for advanced searching and so on, but for
basic"enter text/hit search" queries, it's more work.
Yes, that makes sense.
...
Not a big deal, but an idea for 1.0 I guess ? :)
I'll probably shoot for 0.3...
...
Also, and this is muddy in my mind also, but would having different
weights per object type make some kind of sense ?  Since Okapi's
scoring is sensitive to the proportion of terms vs total number of
words, the weighing might need to change based on the type of object,
and whether one attribute's textual content is "wordier" than another.
 I don't know if that
makes sense ?
It could be done, but it would require changing the data structures
since the weights would be applied at query time but would need to be
stored at index time. Actually being able to weight objects arbitrary
might be pretty interesting. It would allow you to pull tricks like
Google and make certain results always rate high for certain queries.

I'm not sure if that's what you were thinking but it could be a pretty
cool trick.
...
Oh, and I'm noticing FieldedTextIndex uses Okapi ONLY ? That's fine
for me at least, but might want to document that, for those that might
use the other algorithm ?
For simplicity sake I chose to do Okapi only. Doing both would be pretty
complex and I didn't think it was worth the effort since I always use
Okapi myself. I'm going to call YAGNI on this for now.
...
Phew, sorry, I'm babbling ...
No, the input is always appreciated.

-Casey
...
-----Original Message-----
From: Casey Duncan [mailto:casey@zope.com]
Sent: Monday, January 12, 2004 1:15 AM
To: Jean-Francois.Doyon@CCRS.NRCan.gc.ca
Cc: zope@zope.org
Subject: FieldedTextIndex weighting implemented
I have an initial implementation of the per-field weighting checked
into cvs. I haven't released 0.2 yet, but I expect to sometime in the
next week or so. The new feature is unit-tested and documented in the
README, so feel free to check it out in the mean time.
You can download it directly from CVS at:
http://cvs.zope.org/Products/FieldedTextIndex/
There is a link to generate a tarball at the bottom, or you can check
in out using anonymous cvs if you want to track changes.
Let me know what you think.
-Casey