I'm going to be a bit more assertive about this and forward it to the zope list... -Michel -------- Original Message -------- Subject: [Zope-dev] Spliting text Date: Thu, 06 Apr 2000 14:00:20 -0700 From: Michel Pelletier <michel@digicool.com> To: zope-dev@zope.org I'm trying to think of some good ways to improve the way ZCatalog splits text. The component that does this is called the Splitter, and it is currently written in C. In the CVS there is a new mechanism for abstracting out language dependent features, like splitting, into various Vocabulary objects. I want to make it as trivial as possible for an average programmer to create Vocabulary objects for their language of choice. Right now there are some prototypes doing Japanese, and I and the guys at Digital Garage have done some research on the similar Chinese problem. I am contemplating checking in Brian Hooper's ExtensionClass patch to the Splitter, so that the splitter can be subclasses in python. While more useful, I don't think this will gain much; the Splitter is very english centric and not very extensible (to solve, for example, the japanese problem) and subclassers would need to re-implement most of it in python anyway, so there is no big win. I think that the existing C splitter should be internationalized. I would like some suggestions on how people think the lexical analysis problem is solved for their particular language, and I'd like to discuss how to generalize the Splitter on the Interfaces Wiki: http://www.zope.org/Members/michel/Projects/Interfaces/Splitter Can someone post this to the ZIP list? I'm not on it at the moment... -Michel _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )