RE: [Zope-dev] [Petition] Kludge for Splitter.c (long)

17 Jan 2000


      ...
-----Original Message-----
From: LEE, Kwan Soo [mailto:kslee@plaza1.snu.ac.kr]
...
1. How about a dumb kludge Splitter.c which treats the 
characters in the user-specifiable/configuable list as white 
space and all other characters upto char(255) as meaningfull 
character and splits the text.
This could fix your problem, but won't work for multi-byte char strings.
...
In Korean, the current approach based on 'stem' words and 
'stop' words will simple not work. For we have quite 
different writing convention. I guess many other (small) 
languages have simillar problems. Still, Full Text Search 
capabilities are so valuable to live without it.
Currently, the stemming and stopping of works in ZCatalog is English
language dependent.
...
Furthermore, what if a Zope site contains documents in many 
languages? I guess the approach based on _ONE_ locale will 
not work greatly. Does one need several personalities of Splitter?
Possibly, or a new approach to the whole problem.
...
Before the "Full I18N/Localization Support"(I'm not sure what 
that mean ...) of Python & ZOPE,  a (maybe unsupported or
I18N means 'internationalization':  'I' followed by 18 chars followed
by 'N'.
...
community supported) kludge Splitter module with adequate 
warning may relieve the lives of lots of 
none-English/European Language Zopistas.
2. Can any one eplain(or give the clue of) the difference of 
SearchIndex/ZCatalog i Zope 2.0.x and 2.1.x? Especially the 
role of subindex in TextIndex.py and UnTextIndex.py? My 
Splitter.py gets errors whenever subindex is related.
If your splitter works identically to the one that comes with Zope
there should be no problem.

-Michel

RE: [Zope-dev] [Petition] Kludge for Splitter.c (long)

Michel Pelletier