I think I'd have to jump on the bandwagon and agree that numbers should not be stripped. I'll second the idea of a fish-bowl proposal. In a full text search of classified ads, for example, one wants to search for a 2000 Ford F150; in Zope 2.3.x, Splitter.c stripped out both 2000 and F150. The change was easy: just replace isalpha() with isalnum() in the relevant part of the code. I'm not sure what the story is in 2.4, but it sounds like people searching for a year 2000 truck are going to find ads for ones built in 1982. I use a modified Splitter.so that allows numbers, as well as one-character words, so people can search for "c programmer" in the classified ads. I'm curious about a few other things (that I really haven't tested): - How does Zope's splitter handle hyphenated words? - Is there a way to split words with period characters reliably, supposing I wanted to be able to search for terms like "yahoo.com" or "Splitter.so" or "Microsoft .NET" in text? I would think that the appropriate default behavior for ZopeSplitter would be relaxed about stripping out things. Sean -----Original Message----- From: Andreas Jung [mailto:andreas@zope.com] Sent: Tuesday, November 13, 2001 11:08 AM To: Casey Duncan; richard@bizarsoftware.com.au; zope@zope.org Subject: Re: [Zope] Indexing: ZopeSplitter and numbers Zope 2.4.X allows to have multiple splitters. So you can write your own splitter. The only disadvantage is that there is currently no offical API (except monkeypatching) to add custom splitters (but there is a already a proposal in the fishbowl to address this problem). Andreas ----- Original Message ----- From: "Casey Duncan" <c.duncan@nlada.org> To: "Andreas Jung" <andreas@andreas-jung.com>; <richard@bizarsoftware.com.au>; <zope@zope.org> Sent: Tuesday, November 13, 2001 13:52 Subject: Re: [Zope] Indexing: ZopeSplitter and numbers
On Tuesday 13 November 2001 07:05 am, Andreas Jung allegedly wrote:
The answer is - as always - in the sources ;-) The splitting algorithm is pretty dumb. Roughly spoken it splits the text in words but not into numbers. To test the splitter try this:
from ZopeSplitter import ZopeSplitter print list(ZopeSplitter('abc 123 t353 nmj'))
gives ['abc', 't353', 'nmj']
Andreas
Has there been any thought in changing this behavior? I smell a fish bowl prop...
/---------------------------------------------------\ Casey Duncan, Sr. Web Developer National Legal Aid and Defender Association c.duncan@nlada.org \---------------------------------------------------/
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )