Michel Pelletier wrote:
<snip><snip>
I am thinking heavily about this very problem as we speak. You all correctly pointed out some of the toughest of the problems. Here are my ideas so far:
Have 'vocabulary objects' store the stopwords, synonyms, stemming rules, and lexicon (collection of uniquely indexed words) in a drop-in object for ZCatalog. This way, a 'French', 'Dutch' etc. vocabulary object could be developed by a third party.
Sense this is a ZCatalog issue, and ZCatalog looks like it might be used alot in large sites, what is the feasibility of the Wordnet project's data and data model being used to enable "Smart Searching" for "future" multilingual searching. The stopwords would be up to Zope and if the EuroWordnet data was available then that language could be searched just like english. EuroWordnet project is creating wordnets databases in something like 21+ languages. (Their main site is not working...) the Wordnet project's site is: http://www.cogsci.princeton.edu/~wn/ there is a python API for Wordnet avialable at: http://www.cs.brandeis.edu/~steele/sources/wordnet-python.html Just wanted to get the idea out and see what you think??? (Sense we're all asleep, I might not get any responses :) David, tone.. <snip>