[Zope-dev] Spliting text
Michel Pelletier
michel@digicool.com
Thu, 06 Apr 2000 14:00:20 -0700
I'm trying to think of some good ways to improve the way ZCatalog splits
text. The component that does this is called the Splitter, and it is
currently written in C. In the CVS there is a new mechanism for
abstracting out language dependent features, like splitting, into
various Vocabulary objects.
I want to make it as trivial as possible for an average programmer to
create Vocabulary objects for their language of choice. Right now there
are some prototypes doing Japanese, and I and the guys at Digital Garage
have done some research on the similar Chinese problem.
I am contemplating checking in Brian Hooper's ExtensionClass patch to
the Splitter, so that the splitter can be subclasses in python. While
more useful, I don't think this will gain much; the Splitter is very
english centric and not very extensible (to solve, for example, the
japanese problem) and subclassers would need to re-implement most of it
in python anyway, so there is no big win. I think that the existing C
splitter should be internationalized. I would like some suggestions on
how people think the lexical analysis problem is solved for their
particular language, and I'd like to discuss how to generalize the
Splitter on the Interfaces Wiki:
http://www.zope.org/Members/michel/Projects/Interfaces/Splitter
Can someone post this to the ZIP list? I'm not on it at the moment...
-Michel