[Zope] [REQ] Support for multi-lingual components of TextIndexNG wanted
Andreas Jung
andreas@andreas-jung.com
Mon, 17 Jun 2002 08:51:26 -0400
Hi folks,
the next version of TextIndexNG will focus on multi-lingual issues
(and has full unicode support).
I need some support from the community for components
that are language-dependent:
- stopwords
Stopwords are words that are removed during the indexing
process because they are very common e.g. 'a', 'the', 'for'
in English
- normalization
Normalization means the translation of special characters
or a sequence of characters to a more simpler form, e.g.
'Ä' -> 'Ae', 'ä' -> 'ae', ´ß' -> 'ss' or a more radical
reduction like 'Ä' -> 'A', 'ä' -> 'a', ´ß' -> 's'.
Such a reduction allows more fault tolerant searching.
At the moment TextIndexNG supports only German and English.
If you like to see more languages supported by TextIndexNG,
feel free to contribute lists with stopwords of your language
and/or translation rules for the normalization step.
Thanks,
Andreas