---------- Forwarded message ---------- Date: Thu, 06 Apr 2000 14:00:20 -0700 From: Michel Pelletier <michel@digicool.com> To: zope-dev@zope.org Subject: [Zope-dev] Spliting text I'm trying to think of some good ways to improve the way ZCatalog splits text. The component that does this is called the Splitter, and it is currently written in C. In the CVS there is a new mechanism for abstracting out language dependent features, like splitting, into various Vocabulary objects. I want to make it as trivial as possible for an average programmer to create Vocabulary objects for their language of choice. Right now there are some prototypes doing Japanese, and I and the guys at Digital Garage have done some research on the similar Chinese problem. I am contemplating checking in Brian Hooper's ExtensionClass patch to the Splitter, so that the splitter can be subclasses in python. While more useful, I don't think this will gain much; the Splitter is very english centric and not very extensible (to solve, for example, the japanese problem) and subclassers would need to re-implement most of it in python anyway, so there is no big win. I think that the existing C splitter should be internationalized. I would like some suggestions on how people think the lexical analysis problem is solved for their particular language, and I'd like to discuss how to generalize the Splitter on the Interfaces Wiki: http://www.zope.org/Members/michel/Projects/Interfaces/Splitter Can someone post this to the ZIP list? I'm not on it at the moment... -Michel