[Zope-dev] Translator Rules...How should we attack the issue?

Stephan Richter srichter@cbu.edu
Fri, 24 Mar 2000 15:22:17 -0600


Another issue we all should be concerned with is grammar. I know that the 
chinese grammar is as simple as it gets (maybe, mandarin should be the 
world language). Furthermore, the European languages (especially the 
latin-based ones) are similar. But there are small differences:

So for example (I will try my best with French):

I seek my red home.
Ich suche mein rotes zu Hause.
Je cherche mon maison rouge.

I see so many problems here. For example, the sentance structure between 
German and English is pretty much the same. BUT, in German you have to 
conjugate "suchen" (seek) as well as "rot" (red). The translator certainly 
should not attempt translate the entire sentence. It should be smart about 
grammar.
But it gets much worse than that. In french adjectives are "usually" (geez, 
another issue) placed behind the noun. Aditionally, to conjugate "chercher" 
(seek), you have to conjugate "mon" (my), since that depends whether you 
are guy or a girl.

So here are some of my thoughts about the issue (they are not organized or 
well thought through):

- Evaluating a sentence should work like parsing an algebratic expression 
into reverse polish notation (RPN) using stacks or a tree.
- Each subtree will automatically represent a phrase.
- Each word and phrase is an object that contains many information, 
containing grammar.
   So for example, "suchen" should contain all its conjugations. The same 
for the adjective "rot".
   I think an abstract base class called WORD should be written, and then 
derived classes called ADJECTIVE, VERB, NOUN, ADVERB ...
   These classes should also reference each other, since in German you can 
easily make out of adverbs --> nouns --> verbs.
   Mmmh, that brings me to another point, specific for German. We have a 
lot of compound words... That will be hard...

So here an example for: I seek my red home.
                                       /                      \
                                 I seek               my red home
                               /         \                 /           \
                            I           seek          my           red home
                                                                          / 
\
                                                                       home 
red

So I would translate (walking the tree):

I --> Ich  (Note: we know that it is first person singular)
seek --> suchen ---> suche (since we have first person singular)
I seek --> Ich suche

my --> mein (signals posession: 4th case--> Akusativ)
home --> zu Hause (Note: we know it is neutral, because: das zu Hause)
red --> rot --> rotes (because 4th case neutral)
red home --> rotes zu Hause
my red Home --> mein rotes zu Hause

I seek my red home. --> Ich suche mein rotes zu Hause.

Now that was built on German grammar rules. We certainly could do this with 
French and Spanish as well.

We probably need some language experts which can tell us, which words are 
more important in defining the grammar than others. While I was doing the 
example, I noticed that nouns are more important than adjectives.
Furthermore, we should consult a graph theorist who can help us with 
creating trees, based on these rules. He might be able to use some math to 
optimize the algorithm.

As I said, these are just some ideas. Any comments?

Regards,
Stephan
--
Stephan Richter - (901) 573-3308 - srichter@cbu.edu
CBU - Physics & Chemistry; Framework Web - Web Design & Development
PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391