[CC'd the list in case anybody else is interested] On Mon, 12 Jan 2004 17:44:52 -0500 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
Casey,
I created the Index, and that works fine, using the only ZCLexicon I have which is the same I use for ZCTextIndexes ...
Now if I try to view the management page for the Index by clicking on it in the ZMI, I get :
Traceback (innermost last):
Module ZPublisher.Publish, line 89, in publish Module ZPublisher.BaseRequest, line 299, in traverse Module Products.ZCatalog.ZCatalogIndexes, line 109, in __bobo_traverse__ AttributeError: _catalog
Hmm, I'm not sure how the index could be causing that. Pretty weird.
After navigating the portal_catalog interface a bit (This is in CMF BTW), the problem seems to go away ...
Once I've reached this point, I can go ahead and index content.
I'd suggest adding to the ZMI interface of the index the name of the attribute/method being indexed, it's always handy.
Good point. I thought I'd done that. I definitely will.
I'll let you know how the actual searching goes !
Some thought's I'm having: It'd be nice if I could store default weights into the index, that it would use transparently, unless specific weights are provided with the query itself. Also it provides an easy to use interface to set weights, instead of having to change code (Possibly often, when one tweaks this stuff). AND, because (In my case at least) 99% of searches are on ALL fields anyways, it prevents me from having to supply a list of fields and weights each and everytime.
Yes that makes sense. I'll need to implement weighting when searching all fields which should be straightforward and useful.
The current setup is good for advanced searching and so on, but for basic"enter text/hit search" queries, it's more work.
Yes, that makes sense.
Not a big deal, but an idea for 1.0 I guess ? :)
I'll probably shoot for 0.3...
Also, and this is muddy in my mind also, but would having different weights per object type make some kind of sense ? Since Okapi's scoring is sensitive to the proportion of terms vs total number of words, the weighing might need to change based on the type of object, and whether one attribute's textual content is "wordier" than another. I don't know if that makes sense ?
It could be done, but it would require changing the data structures since the weights would be applied at query time but would need to be stored at index time. Actually being able to weight objects arbitrary might be pretty interesting. It would allow you to pull tricks like Google and make certain results always rate high for certain queries. I'm not sure if that's what you were thinking but it could be a pretty cool trick.
Oh, and I'm noticing FieldedTextIndex uses Okapi ONLY ? That's fine for me at least, but might want to document that, for those that might use the other algorithm ?
For simplicity sake I chose to do Okapi only. Doing both would be pretty complex and I didn't think it was worth the effort since I always use Okapi myself. I'm going to call YAGNI on this for now.
Phew, sorry, I'm babbling ...
No, the input is always appreciated. -Casey
-----Original Message----- From: Casey Duncan [mailto:casey@zope.com] Sent: Monday, January 12, 2004 1:15 AM To: Jean-Francois.Doyon@CCRS.NRCan.gc.ca Cc: zope@zope.org Subject: FieldedTextIndex weighting implemented
I have an initial implementation of the per-field weighting checked into cvs. I haven't released 0.2 yet, but I expect to sometime in the next week or so. The new feature is unit-tested and documented in the README, so feel free to check it out in the mean time.
You can download it directly from CVS at:
http://cvs.zope.org/Products/FieldedTextIndex/
There is a link to generate a tarball at the bottom, or you can check in out using anonymous cvs if you want to track changes.
Let me know what you think.
-Casey