9 Nov
2002
9 Nov
'02
1:17 p.m.
The problem seems to be that ZCTextIndex indeed does not do the splitting "right" if German Umlauts are used. There is no option for "Unicode-aware splitter". Instead of a Vocabulary it uses a Lexicon, which just offers two options: "HTML aware splitter" and "Whitespace splitter". I haven't tested the whitespace splitter yet, but the HTML aware splitter did not do the Umlaut thing right without the patch, i.e. it used umlauts as splitting characters ...
That's just what the default ZMI interface for ZCTextIndex offers. It's easy to add your own splitter by writing a few lines of Python code. RTSL. --Guido van Rossum (home page: http://www.python.org/~guido/)