Hi!
Please note that former Zope versions already include a dedicated unicode-aware splitter that is already usable with the old TextIndex and maybe with ZCTextIndex. TextIndexNG resolves all these issues by doing the complete internal processing by converting the data into unicode. Every single processing step only handles unicode data.
Most older browsers should be able to handle at least UTF-8 as character set. This is sufficient for most cases.
The problem seems to be that ZCTextIndex indeed does not do the splitting "right" if German Umlauts are used. There is no option for "Unicode-aware splitter". Instead of a Vocabulary it uses a Lexicon, which just offers two options: "HTML aware splitter" and "Whitespace splitter". I haven't tested the whitespace splitter yet, but the HTML aware splitter did not do the Umlaut thing right without the patch, i.e. it used umlauts as splitting characters ... So there is a bug ... Joachim