These are good ideas to improve the TextIndex. I already encouraged Erik to put alltogether into a Fishbowl proposal, Andreas ----- Original Message ----- From: "Dieter Maurer" <dieter@handshake.de> To: "Rik Hoekstra" <rik.hoekstra@inghist.nl> Cc: "Chris McDonough" <chrism@digicool.com>; "Erik Enge" <erik@thingamy.net>; <zope-dev@zope.org> Sent: Monday, June 18, 2001 4:59 PM Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
Rik Hoekstra writes:
This raises the question how dependent the splitter on the paticularities of the document source - I do not really see how different splitters could be useful for one single document. This is perhaps less obvious than it appears, as you may want to use different splitters for documents in different languages. Taken as a whole I would say choosing a splitter would be a decision that had to be taken at indexing time anyway. But perhaps it's just my imagination that is lacking. There are lots of things you may want to change based on experience with your index:
* change the set of token boundary characters they define, where words are broken out.
* change the set of removed characters they are removed from the words, usually for normalization.
In German, e.g., you can write both "Auto-Lackierer" and "Autolackierer". You want to normalize these different spellings.
* change the set of "composing" characters
German is very rich in composite terms. You may want to index under each component term. For this, you need the rules on how the composition is build. For text, it is usually '-'. But if you have computer sources, '_' or ':' may be relevant, too.
Of couse, the search must follow the same splitting rules than the indexing did. Changing the rules (the splitter or its configuration) after indexing will make the index inconsistent.
Dieter
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )