Hello Stephan, One suggestion: just don't try it in Zope :-). Sorry if I sound overly positive but as they say in the Jargon File this problem is "AI-complete" (i.e. just too difficult). Very complex algorithms are needed to implement even basic automated translation. And they still output gobbledygook. So IMO we just need to implement a translation memory (i.e. a string repository where human-created translations are stored). A translation glossary + a block substitution system should be sufficient to support multilingual pages in Zope. This is what is needed in the short term. Cheers. Alexandre PS: I don't want to sound finicky but you need to say "ma maison" in French :-). At 15:22 24/03/2000 -0600, Stephan Richter wrote:
Another issue we all should be concerned with is grammar. I know that the chinese grammar is as simple as it gets (maybe, mandarin should be the world language). Furthermore, the European languages (especially the latin-based ones) are similar. But there are small differences:
So for example (I will try my best with French):
I seek my red home. Ich suche mein rotes zu Hause. Je cherche mon maison rouge.
I see so many problems here. For example, the sentance structure between German and English is pretty much the same. BUT, in German you have to conjugate "suchen" (seek) as well as "rot" (red). The translator certainly should not attempt translate the entire sentence. It should be smart about grammar. But it gets much worse than that. In french adjectives are "usually" (geez, another issue) placed behind the noun. Aditionally, to conjugate "chercher" (seek), you have to conjugate "mon" (my), since that depends whether you are guy or a girl.
So here are some of my thoughts about the issue (they are not organized or well thought through):
- Evaluating a sentence should work like parsing an algebratic expression into reverse polish notation (RPN) using stacks or a tree. - Each subtree will automatically represent a phrase. - Each word and phrase is an object that contains many information, containing grammar. So for example, "suchen" should contain all its conjugations. The same for the adjective "rot". I think an abstract base class called WORD should be written, and then derived classes called ADJECTIVE, VERB, NOUN, ADVERB ... These classes should also reference each other, since in German you can easily make out of adverbs --> nouns --> verbs. Mmmh, that brings me to another point, specific for German. We have a lot of compound words... That will be hard...
So here an example for: I seek my red home. / \ I seek my red home / \ / \ I seek my red home / \
home red
So I would translate (walking the tree):
I --> Ich (Note: we know that it is first person singular) seek --> suchen ---> suche (since we have first person singular) I seek --> Ich suche
my --> mein (signals posession: 4th case--> Akusativ) home --> zu Hause (Note: we know it is neutral, because: das zu Hause) red --> rot --> rotes (because 4th case neutral) red home --> rotes zu Hause my red Home --> mein rotes zu Hause
I seek my red home. --> Ich suche mein rotes zu Hause.
Now that was built on German grammar rules. We certainly could do this with French and Spanish as well.
We probably need some language experts which can tell us, which words are more important in defining the grammar than others. While I was doing the example, I noticed that nouns are more important than adjectives. Furthermore, we should consult a graph theorist who can help us with creating trees, based on these rules. He might be able to use some math to optimize the algorithm.
As I said, these are just some ideas. Any comments?
Regards, Stephan -- Stephan Richter - (901) 573-3308 - srichter@cbu.edu CBU - Physics & Chemistry; Framework Web - Web Design & Development PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391