-----Original Message----- From: LEE, Kwan Soo [mailto:kslee@plaza1.snu.ac.kr]
1. How about a dumb kludge Splitter.c which treats the characters in the user-specifiable/configuable list as white space and all other characters upto char(255) as meaningfull character and splits the text.
This could fix your problem, but won't work for multi-byte char strings.
In Korean, the current approach based on 'stem' words and 'stop' words will simple not work. For we have quite different writing convention. I guess many other (small) languages have simillar problems. Still, Full Text Search capabilities are so valuable to live without it.
Currently, the stemming and stopping of works in ZCatalog is English language dependent.
Furthermore, what if a Zope site contains documents in many languages? I guess the approach based on _ONE_ locale will not work greatly. Does one need several personalities of Splitter?
Possibly, or a new approach to the whole problem.
Before the "Full I18N/Localization Support"(I'm not sure what that mean ...) of Python & ZOPE, a (maybe unsupported or
I18N means 'internationalization': 'I' followed by 18 chars followed by 'N'.
community supported) kludge Splitter module with adequate warning may relieve the lives of lots of none-English/European Language Zopistas.
2. Can any one eplain(or give the clue of) the difference of SearchIndex/ZCatalog i Zope 2.0.x and 2.1.x? Especially the role of subindex in TextIndex.py and UnTextIndex.py? My Splitter.py gets errors whenever subindex is related.
If your splitter works identically to the one that comes with Zope there should be no problem. -Michel