ZCatalog cannot support chinese?
Hi,all In my project, I want to use ZCatalog to build up a search interface! But It doesnot support Chinese. Can some one give me some advice on it. Victor zhai. "WorldSecure Server <ogilvy.com>" made the following annotations on 02/24/00 20:29:15 ------------------------------------------------------------------------------ Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the Ogilvy Group shall be understood as neither given nor endorsed by it. ==============================================================================
Victor.Zhai@ogilvy.com wrote:
Hi,all In my project, I want to use ZCatalog to build up a search interface! But It doesnot support Chinese. Can some one give me some advice on it.
ZCatalog does not currently support Chinese for several reasons: 1) I've never seen or worked with Chinese, and I have no environment to debug it. 2) Python itself is still working on complete internationalization 3) ZCatalog is very english-centric However, I am working on several enhancements to ZCatalog which will help you here. First, ZCatalog now supports the notion of Vocabularies. Vocabularies are seperate objects from ZCatalogs. Vocabularies seperate all of the language specific features from ZCatalog. Therefore, if you subclass and create your own kind of Vocabulary (say, ChineseVocabulary), you can: 1) create your own kind of 'Splitter', which is the object that splits documents into words. Currently Zope's splitter is very simply and only understands english (and some european) languages how to split words on spaces. Splitting chinese probably requires a much different algorithm. 2) control stop words and synonyms, right now, Zope has hard-coded stopwords that are english only, and no synonym support. In 2.2, Zope Vocabularies will allow you to control these stopwords and synonyms in a language neutral fashion. There features are in the current CVS but they are still quite raw. What would help is the currently unreleased ZCatalog User's Guide, the latest version of which is currently on a Zip disk packed in a box somehere here in my apartment. I should really dig that up. But for chinese support, you're going to have to roll up your sleeves a little and subclass your own kind of Vocabulary object. This is not really so hard to do, it's just hard to understand without documentation. -Michel
Hi Victor, I made a Splitter for Japanese, that also turns Splitter into an ExtensionClass that can be used to subclass Splitters for other languages. Once you can split Chinese documents into separate words, then ZCatalog can pretty much handle the rest on its own. I would be happy to help you create Chinese support, since I'm pretty interested in that myself - what I'll need to have some help from you to find is something to help do lexical analysis of Chinese text, preferably something free. For Japanese I used ChaSen, which is a Japanese text analysis library from the Nara Institute of Technology - I can feed it a Japanese document and it checks a dictionary and comes back and tells me what all the words are, and what part of speech each word is, etc. The ChaSen home page is all in Japanese, so I probably couldn't have found it without a Japanese-capable environment (and Japanese language skills)... Could you try to find a similar library for Chinese? If so, I will help you make it work for searching... --Brian Hooper On Thu, 24 Feb 2000 18:14:57 -0800 Michel Pelletier <michel@digicool.com> wrote:
Victor.Zhai@ogilvy.com wrote:
Hi,all In my project, I want to use ZCatalog to build up a search interface! But It doesnot support Chinese. Can some one give me some advice on it.
ZCatalog does not currently support Chinese for several reasons:
1) I've never seen or worked with Chinese, and I have no environment to debug it.
2) Python itself is still working on complete internationalization
3) ZCatalog is very english-centric
However, I am working on several enhancements to ZCatalog which will help you here. First, ZCatalog now supports the notion of Vocabularies. Vocabularies are seperate objects from ZCatalogs. Vocabularies seperate all of the language specific features from ZCatalog. Therefore, if you subclass and create your own kind of Vocabulary (say, ChineseVocabulary), you can:
1) create your own kind of 'Splitter', which is the object that splits documents into words. Currently Zope's splitter is very simply and only understands english (and some european) languages how to split words on spaces. Splitting chinese probably requires a much different algorithm.
2) control stop words and synonyms, right now, Zope has hard-coded stopwords that are english only, and no synonym support. In 2.2, Zope Vocabularies will allow you to control these stopwords and synonyms in a language neutral fashion.
There features are in the current CVS but they are still quite raw. What would help is the currently unreleased ZCatalog User's Guide, the latest version of which is currently on a Zip disk packed in a box somehere here in my apartment. I should really dig that up.
But for chinese support, you're going to have to roll up your sleeves a little and subclass your own kind of Vocabulary object. This is not really so hard to do, it's just hard to understand without documentation.
-Michel
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
participants (3)
-
Brian Takashi Hooper -
Michel Pelletier -
Victor.Zhai@ogilvy.com