[Zope] Re: FieldedTextIndex weighting implemented
Casey Duncan
casey at zope.com
Thu Jan 15 00:59:22 EST 2004
[CC'd the list in case anybody else is interested]
On Mon, 12 Jan 2004 17:44:52 -0500
Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:
> Casey,
>
> I created the Index, and that works fine, using the only ZCLexicon I
> have which is the same I use for ZCTextIndexes ...
>
> Now if I try to view the management page for the Index by clicking on
> it in the ZMI, I get :
>
> Traceback (innermost last):
>
> Module ZPublisher.Publish, line 89, in publish
> Module ZPublisher.BaseRequest, line 299, in traverse
> Module Products.ZCatalog.ZCatalogIndexes, line 109, in
> __bobo_traverse__ AttributeError: _catalog
Hmm, I'm not sure how the index could be causing that. Pretty weird.
> After navigating the portal_catalog interface a bit (This is in CMF
> BTW), the problem seems to go away ...
>
> Once I've reached this point, I can go ahead and index content.
>
> I'd suggest adding to the ZMI interface of the index the name of the
> attribute/method being indexed, it's always handy.
Good point. I thought I'd done that. I definitely will.
> I'll let you know how the actual searching goes !
>
> Some thought's I'm having: It'd be nice if I could store default
> weights into the index, that it would use transparently, unless
> specific weights are provided with the query itself. Also it provides
> an easy to use interface to set weights, instead of having to change
> code (Possibly often, when one tweaks this stuff). AND, because (In
> my case at least) 99% of searches are on ALL fields anyways, it
> prevents me from having to supply a list of fields and weights each
> and everytime.
Yes that makes sense. I'll need to implement weighting when searching
all fields which should be straightforward and useful.
> The current setup is good for advanced searching and so on, but for
> basic"enter text/hit search" queries, it's more work.
Yes, that makes sense.
> Not a big deal, but an idea for 1.0 I guess ? :)
I'll probably shoot for 0.3...
> Also, and this is muddy in my mind also, but would having different
> weights per object type make some kind of sense ? Since Okapi's
> scoring is sensitive to the proportion of terms vs total number of
> words, the weighing might need to change based on the type of object,
> and whether one attribute's textual content is "wordier" than another.
> I don't know if that
> makes sense ?
It could be done, but it would require changing the data structures
since the weights would be applied at query time but would need to be
stored at index time. Actually being able to weight objects arbitrary
might be pretty interesting. It would allow you to pull tricks like
Google and make certain results always rate high for certain queries.
I'm not sure if that's what you were thinking but it could be a pretty
cool trick.
> Oh, and I'm noticing FieldedTextIndex uses Okapi ONLY ? That's fine
> for me at least, but might want to document that, for those that might
> use the other algorithm ?
For simplicity sake I chose to do Okapi only. Doing both would be pretty
complex and I didn't think it was worth the effort since I always use
Okapi myself. I'm going to call YAGNI on this for now.
> Phew, sorry, I'm babbling ...
No, the input is always appreciated.
-Casey
> -----Original Message-----
> From: Casey Duncan [mailto:casey at zope.com]
> Sent: Monday, January 12, 2004 1:15 AM
> To: Jean-Francois.Doyon at CCRS.NRCan.gc.ca
> Cc: zope at zope.org
> Subject: FieldedTextIndex weighting implemented
>
>
> I have an initial implementation of the per-field weighting checked
> into cvs. I haven't released 0.2 yet, but I expect to sometime in the
> next week or so. The new feature is unit-tested and documented in the
> README, so feel free to check it out in the mean time.
>
> You can download it directly from CVS at:
>
> http://cvs.zope.org/Products/FieldedTextIndex/
>
> There is a link to generate a tarball at the bottom, or you can check
> in out using anonymous cvs if you want to track changes.
>
> Let me know what you think.
>
> -Casey
More information about the Zope
mailing list