PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase indexingrevisited)

Andreas Jung Andreas Jung" <andreas@andreas-jung.com
Mon, 18 Jun 2001 19:09:27 -0400


These are good ideas to improve the TextIndex. I already encouraged Erik
to put alltogether into a Fishbowl proposal,

Andreas
----- Original Message -----
From: "Dieter Maurer" <dieter@handshake.de>
To: "Rik Hoekstra" <rik.hoekstra@inghist.nl>
Cc: "Chris McDonough" <chrism@digicool.com>; "Erik Enge"
<erik@thingamy.net>; <zope-dev@zope.org>
Sent: Monday, June 18, 2001 4:59 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase
indexingrevisited)


> Rik Hoekstra writes:
>  > This raises the question how dependent the splitter on the
paticularities of the
>  > document source - I do not really see how different splitters could be
useful
>  > for one single document. This is perhaps less obvious than it appears,
as you
>  > may want to use different splitters for documents in different
languages. Taken
>  > as a whole I would say choosing a splitter would be a decision that had
to be
>  > taken at indexing time anyway. But perhaps it's just my imagination
that is
>  > lacking.
> There are lots of things you may want to change based on
> experience with your index:
>
>   *  change the set of token boundary characters
>      they define, where words are broken out.
>
>   *  change the set of removed characters
>      they are removed from the words, usually for
>      normalization.
>
>      In German, e.g., you can write both "Auto-Lackierer"
>      and "Autolackierer". You want to normalize
>      these different spellings.
>
>   *  change the set of "composing" characters
>
>      German is very rich in composite terms.
>      You may want to index under each component term.
>      For this, you need the rules on how the composition
>      is build.
>      For text, it is usually '-'. But if you have
>      computer sources, '_' or ':' may be relevant, too.
>
> Of couse, the search must follow the same splitting rules
> than the indexing did. Changing the rules (the splitter
> or its configuration) after indexing will make the index
> inconsistent.
>
>
> Dieter
>
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )