PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)
Andreas Jung
Andreas Jung" <andreas@andreas-jung.com
Mon, 18 Jun 2001 19:09:27 -0400
These are good ideas to improve the TextIndex. I already encouraged Erik
to put alltogether into a Fishbowl proposal,
Andreas
----- Original Message -----
From: "Dieter Maurer" <dieter@handshake.de>
To: "Rik Hoekstra" <rik.hoekstra@inghist.nl>
Cc: "Chris McDonough" <chrism@digicool.com>; "Erik Enge"
<erik@thingamy.net>; <zope-dev@zope.org>
Sent: Monday, June 18, 2001 4:59 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)
> Rik Hoekstra writes:
> > This raises the question how dependent the splitter on the
paticularities of the
> > document source - I do not really see how different splitters could be
useful
> > for one single document. This is perhaps less obvious than it appears,
as you
> > may want to use different splitters for documents in different
languages. Taken
> > as a whole I would say choosing a splitter would be a decision that had
to be
> > taken at indexing time anyway. But perhaps it's just my imagination
that is
> > lacking.
> There are lots of things you may want to change based on
> experience with your index:
>
> * change the set of token boundary characters
> they define, where words are broken out.
>
> * change the set of removed characters
> they are removed from the words, usually for
> normalization.
>
> In German, e.g., you can write both "Auto-Lackierer"
> and "Autolackierer". You want to normalize
> these different spellings.
>
> * change the set of "composing" characters
>
> German is very rich in composite terms.
> You may want to index under each component term.
> For this, you need the rules on how the composition
> is build.
> For text, it is usually '-'. But if you have
> computer sources, '_' or ':' may be relevant, too.
>
> Of couse, the search must follow the same splitting rules
> than the indexing did. Changing the rules (the splitter
> or its configuration) after indexing will make the index
> inconsistent.
>
>
> Dieter
>
> _______________________________________________
> Zope-Dev maillist - Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://lists.zope.org/mailman/listinfo/zope-announce
> http://lists.zope.org/mailman/listinfo/zope )