[Zope-dev] Re: Translator Rules...How should we attack the issue?

Alexandre Ratti alex@gabuzomeu.net
Sat, 25 Mar 2000 00:03:05 +0100


Hello Stephan,


One suggestion: just don't try it in Zope :-).

Sorry if I sound overly positive but as they say in the Jargon File this 
problem is "AI-complete" (i.e. just too difficult).

Very complex algorithms are needed to implement even basic automated 
translation. And they still output gobbledygook.

So IMO we just need to implement a translation memory (i.e. a string 
repository where human-created translations are stored). A translation 
glossary + a block substitution system should be sufficient to support 
multilingual pages in Zope. This is what is needed in the short term.


Cheers.

Alexandre


PS: I don't want to sound finicky but you need to say "ma maison" in French 
:-).


At 15:22 24/03/2000 -0600, Stephan Richter wrote:
>Another issue we all should be concerned with is grammar. I know that the 
>chinese grammar is as simple as it gets (maybe, mandarin should be the 
>world language). Furthermore, the European languages (especially the 
>latin-based ones) are similar. But there are small differences:
>
>So for example (I will try my best with French):
>
>I seek my red home.
>Ich suche mein rotes zu Hause.
>Je cherche mon maison rouge.
>
>I see so many problems here. For example, the sentance structure between 
>German and English is pretty much the same. BUT, in German you have to 
>conjugate "suchen" (seek) as well as "rot" (red). The translator certainly 
>should not attempt translate the entire sentence. It should be smart about 
>grammar.
>But it gets much worse than that. In french adjectives are "usually" 
>(geez, another issue) placed behind the noun. Aditionally, to conjugate 
>"chercher" (seek), you have to conjugate "mon" (my), since that depends 
>whether you are guy or a girl.
>
>So here are some of my thoughts about the issue (they are not organized or 
>well thought through):
>
>- Evaluating a sentence should work like parsing an algebratic expression 
>into reverse polish notation (RPN) using stacks or a tree.
>- Each subtree will automatically represent a phrase.
>- Each word and phrase is an object that contains many information, 
>containing grammar.
>   So for example, "suchen" should contain all its conjugations. The same 
> for the adjective "rot".
>   I think an abstract base class called WORD should be written, and then 
> derived classes called ADJECTIVE, VERB, NOUN, ADVERB ...
>   These classes should also reference each other, since in German you can 
> easily make out of adverbs --> nouns --> verbs.
>   Mmmh, that brings me to another point, specific for German. We have a 
> lot of compound words... That will be hard...
>
>So here an example for: I seek my red home.
>                                       /                      \
>                                 I seek               my red home
>                               /         \                 /           \
>                            I           seek          my           red home
>                                                                          / \
> 
>home red
>
>So I would translate (walking the tree):
>
>I --> Ich  (Note: we know that it is first person singular)
>seek --> suchen ---> suche (since we have first person singular)
>I seek --> Ich suche
>
>my --> mein (signals posession: 4th case--> Akusativ)
>home --> zu Hause (Note: we know it is neutral, because: das zu Hause)
>red --> rot --> rotes (because 4th case neutral)
>red home --> rotes zu Hause
>my red Home --> mein rotes zu Hause
>
>I seek my red home. --> Ich suche mein rotes zu Hause.
>
>Now that was built on German grammar rules. We certainly could do this 
>with French and Spanish as well.
>
>We probably need some language experts which can tell us, which words are 
>more important in defining the grammar than others. While I was doing the 
>example, I noticed that nouns are more important than adjectives.
>Furthermore, we should consult a graph theorist who can help us with 
>creating trees, based on these rules. He might be able to use some math to 
>optimize the algorithm.
>
>As I said, these are just some ideas. Any comments?
>
>Regards,
>Stephan
>--
>Stephan Richter - (901) 573-3308 - srichter@cbu.edu
>CBU - Physics & Chemistry; Framework Web - Web Design & Development
>PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391
>