[Zope-dev] Re: What catalog/index to use ...

Joachim Werner joe@iuveno-net.de
Sat, 9 Nov 2002 13:17:28 +0100


Hi!

> Please note that former Zope versions already include a  dedicated
> unicode-aware
> splitter that is already usable with the old TextIndex and maybe with
> ZCTextIndex.
> TextIndexNG resolves all these issues by doing the complete internal
> processing by
> converting the data into unicode. Every single processing step only
handles
> unicode
> data.

> Most older browsers should be able to handle at least UTF-8 as character
> set. This is
> sufficient for most cases.

The problem seems to be that ZCTextIndex indeed does not do the splitting
"right" if German Umlauts are used. There is no option for "Unicode-aware
splitter". Instead of a Vocabulary it uses a Lexicon, which just offers two
options: "HTML aware splitter" and "Whitespace splitter". I haven't tested
the whitespace splitter yet, but the HTML aware splitter did not do the
Umlaut thing right without the patch, i.e. it used umlauts as splitting
characters ...

So there is a bug  ...

Joachim