I cc'ed some other people on this, because there is some important locale information in this message.
-----Original Message----- From: Martijn Faassen [mailto:m.faassen@vet.uu.nl] Sent: Tuesday, September 14, 1999 7:33 AM Cc: zope-dev@zope.org Subject: Re: [Zope-dev] Re: [Zope] Need a list of words not indexed by Catalog
Rik Hoekstra wrote:
Terrel Shumway wrote:
near the end of lib/python/SearchIndex/TextIndex.py is a list called 'stop_words'
[Zope Dev] It would be good to move this out of the .py
file into an
editable, internationalizable resource file.
Agreed! And then there's the *multi* lingual issue too. What if I have Dutch and English on my site? [snip] It seems like you run into a _lot_ of complexities with multilingual issues, and still these are real issues for many of us.
Yes, very real issues. Suddenly ZCatalog isn't the almost-ready tool to add searchability to the website I'm building anymore.. Now I need to do quite a bit of extra work, I imagine..
I am thinking heavily about this very problem as we speak. You all correctly pointed out some of the toughest of the problems. Here are my ideas so far: Have 'vocabulary objects' store the stopwords, synonyms, stemming rules, and lexicon (collection of uniquely indexed words) in a drop-in object for ZCatalog. This way, a 'French', 'Dutch' etc. vocabulary object could be developed by a third party. TextIndexes can then reference (or acquire) a vocabular object through which it can stop, syn, stem and store words in it's lexicon. There are many other issues like sharing lexicons between similar language indexes, and having multiple back-end 'index/vocabularies' that all look like one index, so you can search a 'document source' for either 'community' or 'communauté' or 'Gemeinschaft' and get only documents relevant to that language (my applogies if these words are wrong, I'm using babelfish). I think this problem could be intractable though, if you searched for 'walking' in english, the word would stem down into 'walk', if you search for 'marche' en francais, should it stem down to 'promenade'? Anyways, there is some good news. For those of you tracking CVS we have added the ability to set your locale in Zope. This means that, forexample, the splitter/stemmer in the catalog will recognize all of those umlauts and accented letters and whatnots that english doesn't have. We would like a few people all over the place to try this out. If your locale has a different language or monetary system than the US (just about everywhere except some of canada) this might make the catalog and other parts of Zope more useful for you. local can be activated from the z2.py command like with the '-L' option. "-L ''" (an empty string) will cause local to try and autodetect your locale from your environment variables (you must set the env variables yourself, see 'man 7 locale'). Alternativly, you can say "-L de" and set your local to German. Please folks, test this out for us. We don't really have the means to do it here. -Michel
participants (1)
-
Michel Pelletier