[Zope-dev] [Petition] Kludge for Splitter.c (long)
Michel Pelletier
michel@digicool.com
Mon, 17 Jan 2000 16:29:53 -0500
> -----Original Message-----
> From: LEE, Kwan Soo [mailto:kslee@plaza1.snu.ac.kr]
> 1. How about a dumb kludge Splitter.c which treats the
> characters in the user-specifiable/configuable list as white
> space and all other characters upto char(255) as meaningfull
> character and splits the text.
This could fix your problem, but won't work for multi-byte char strings.
> In Korean, the current approach based on 'stem' words and
> 'stop' words will simple not work. For we have quite
> different writing convention. I guess many other (small)
> languages have simillar problems. Still, Full Text Search
> capabilities are so valuable to live without it.
Currently, the stemming and stopping of works in ZCatalog is English
language dependent.
> Furthermore, what if a Zope site contains documents in many
> languages? I guess the approach based on _ONE_ locale will
> not work greatly. Does one need several personalities of Splitter?
Possibly, or a new approach to the whole problem.
> Before the "Full I18N/Localization Support"(I'm not sure what
> that mean ...) of Python & ZOPE, a (maybe unsupported or
I18N means 'internationalization': 'I' followed by 18 chars followed
by 'N'.
> community supported) kludge Splitter module with adequate
> warning may relieve the lives of lots of
> none-English/European Language Zopistas.
>
> 2. Can any one eplain(or give the clue of) the difference of
> SearchIndex/ZCatalog i Zope 2.0.x and 2.1.x? Especially the
> role of subindex in TextIndex.py and UnTextIndex.py? My
> Splitter.py gets errors whenever subindex is related.
If your splitter works identically to the one that comes with Zope
there should be no problem.
-Michel