On Wednesday 14 November 2001 08:26, sean.upton@uniontrib.com wrote:
I think I'd have to jump on the bandwagon and agree that numbers should not be stripped. I'll second the idea of a fish-bowl proposal.
In a full text search of classified ads, for example, one wants to search for a 2000 Ford F150; in Zope 2.3.x, Splitter.c stripped out both 2000 and F150. The change was easy: just replace isalpha() with isalnum() in the relevant part of the code. I'm not sure what the story is in 2.4, but it sounds like people searching for a year 2000 truck are going to find ads for ones built in 1982.
This is the behaviour we want - have you experienced any negative side-effects from doing this?
I use a modified Splitter.so that allows numbers, as well as one-character words, so people can search for "c programmer" in the classified ads.
I'm curious about a few other things (that I really haven't tested): - How does Zope's splitter handle hyphenated words? - Is there a way to split words with period characters reliably, supposing I wanted to be able to search for terms like "yahoo.com" or "Splitter.so" or "Microsoft .NET" in text?
... or e-mail addresses. We currently sub the "@" and "." chars in e-mail addresses with "_" so they are indexed usefully. In your more case, I'm not sure that'd be appropriate. If you only have "keywords" in your TextIndex, I suppose the only stop chars you'd want are whitespace, and everything else is in.
I would think that the appropriate default behavior for ZopeSplitter would be relaxed about stripping out things.
My concern is that there's _specific_ code in there that does this stuff, and I want to know if there'll be any negative consquences of changing its behaviour... Richard