Hello David, Just a quick note about your message. I haven't tested your product yet but I will this week-end. I have a lot of misgivings about any automated translation system based on words (I am a freelance translator and I turn green whenever I see the output of automated translators on the Web). It just doesn't work because you almost never have a one to one mapping between words in languages. The example you decribed below could generate quite a mess in French for instance because of the complex syntax (eg. "green" can be "vert", "verte", "verts" or "vertes" depending on the context). IMO no translation should be performed if there isn't a 100% match between the glossary table and the localizable string. Example: Translation glossary: {'Squids are beautiful': {'fr': 'Les poulpes sont magnifiques'}, ...} Localizable strings in a text: 1st instance: <dtml-translate>Squids are beautiful</dtm-translate> 2nd instance: <dtml-translate>Squids are beautiful in summer</dtm-translate> The 1st example should be translated because it matches. The 2nd shouldn't. If no match is found then the default (source) language is left as is. If you want to create a glossary of common phrases (a so-called "translation memory") you need to store individual sentences or groups of sentences. Then you can reuse translated sentences across texts. Automated sentence translation is a bit safer because sentences provide much more context than individual words. To summarize, there is a trade-off : when you use longer translation units, you get much higher translation quality but less leverage (i.e reuse) across texts. This is a complex issue... if I get started on it I will talk you to sleep :-). Cheers. Alexandre At 03:37 24/03/2000 -0800, you wrote:
Download the translator from
http://www.zope.org/Members/jdavid/translator
The translator product is just the old vocabulary product renamed following the Michell advices and with a new translate tag which is a first implementation of the Shane's proposal.
Suppose the data contained by our Translator instance is:
{'my': {'es': 'mi'}, 'house': {'es': 'casa'}, 'is': {'es': 'es'}, 'green': {'es: 'verde'} }
Then if you type:
"<dtml-translate>my house is green</dtm-in>"
and "languages" is "['es']" then the result is "mi casa es verde" (and it's right!!).
It does it by:
- split the string - translate each word - join the string
Yes, this is only a bogus algorithm but it shows how it would be. Future releases should allow to plug more intelligent translators.
Another issue we all should be concerned with is grammar. I know that the chinese grammar is as simple as it gets (maybe, mandarin should be the world language). Furthermore, the European languages (especially the latin-based ones) are similar. But there are small differences: So for example (I will try my best with French): I seek my red home. Ich suche mein rotes zu Hause. Je cherche mon maison rouge. I see so many problems here. For example, the sentance structure between German and English is pretty much the same. BUT, in German you have to conjugate "suchen" (seek) as well as "rot" (red). The translator certainly should not attempt translate the entire sentence. It should be smart about grammar. But it gets much worse than that. In french adjectives are "usually" (geez, another issue) placed behind the noun. Aditionally, to conjugate "chercher" (seek), you have to conjugate "mon" (my), since that depends whether you are guy or a girl. So here are some of my thoughts about the issue (they are not organized or well thought through): - Evaluating a sentence should work like parsing an algebratic expression into reverse polish notation (RPN) using stacks or a tree. - Each subtree will automatically represent a phrase. - Each word and phrase is an object that contains many information, containing grammar. So for example, "suchen" should contain all its conjugations. The same for the adjective "rot". I think an abstract base class called WORD should be written, and then derived classes called ADJECTIVE, VERB, NOUN, ADVERB ... These classes should also reference each other, since in German you can easily make out of adverbs --> nouns --> verbs. Mmmh, that brings me to another point, specific for German. We have a lot of compound words... That will be hard... So here an example for: I seek my red home. / \ I seek my red home / \ / \ I seek my red home / \ home red So I would translate (walking the tree): I --> Ich (Note: we know that it is first person singular) seek --> suchen ---> suche (since we have first person singular) I seek --> Ich suche my --> mein (signals posession: 4th case--> Akusativ) home --> zu Hause (Note: we know it is neutral, because: das zu Hause) red --> rot --> rotes (because 4th case neutral) red home --> rotes zu Hause my red Home --> mein rotes zu Hause I seek my red home. --> Ich suche mein rotes zu Hause. Now that was built on German grammar rules. We certainly could do this with French and Spanish as well. We probably need some language experts which can tell us, which words are more important in defining the grammar than others. While I was doing the example, I noticed that nouns are more important than adjectives. Furthermore, we should consult a graph theorist who can help us with creating trees, based on these rules. He might be able to use some math to optimize the algorithm. As I said, these are just some ideas. Any comments? Regards, Stephan -- Stephan Richter - (901) 573-3308 - srichter@cbu.edu CBU - Physics & Chemistry; Framework Web - Web Design & Development PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391
Hello Stephan, One suggestion: just don't try it in Zope :-). Sorry if I sound overly positive but as they say in the Jargon File this problem is "AI-complete" (i.e. just too difficult). Very complex algorithms are needed to implement even basic automated translation. And they still output gobbledygook. So IMO we just need to implement a translation memory (i.e. a string repository where human-created translations are stored). A translation glossary + a block substitution system should be sufficient to support multilingual pages in Zope. This is what is needed in the short term. Cheers. Alexandre PS: I don't want to sound finicky but you need to say "ma maison" in French :-). At 15:22 24/03/2000 -0600, Stephan Richter wrote:
Another issue we all should be concerned with is grammar. I know that the chinese grammar is as simple as it gets (maybe, mandarin should be the world language). Furthermore, the European languages (especially the latin-based ones) are similar. But there are small differences:
So for example (I will try my best with French):
I seek my red home. Ich suche mein rotes zu Hause. Je cherche mon maison rouge.
I see so many problems here. For example, the sentance structure between German and English is pretty much the same. BUT, in German you have to conjugate "suchen" (seek) as well as "rot" (red). The translator certainly should not attempt translate the entire sentence. It should be smart about grammar. But it gets much worse than that. In french adjectives are "usually" (geez, another issue) placed behind the noun. Aditionally, to conjugate "chercher" (seek), you have to conjugate "mon" (my), since that depends whether you are guy or a girl.
So here are some of my thoughts about the issue (they are not organized or well thought through):
- Evaluating a sentence should work like parsing an algebratic expression into reverse polish notation (RPN) using stacks or a tree. - Each subtree will automatically represent a phrase. - Each word and phrase is an object that contains many information, containing grammar. So for example, "suchen" should contain all its conjugations. The same for the adjective "rot". I think an abstract base class called WORD should be written, and then derived classes called ADJECTIVE, VERB, NOUN, ADVERB ... These classes should also reference each other, since in German you can easily make out of adverbs --> nouns --> verbs. Mmmh, that brings me to another point, specific for German. We have a lot of compound words... That will be hard...
So here an example for: I seek my red home. / \ I seek my red home / \ / \ I seek my red home / \
home red
So I would translate (walking the tree):
I --> Ich (Note: we know that it is first person singular) seek --> suchen ---> suche (since we have first person singular) I seek --> Ich suche
my --> mein (signals posession: 4th case--> Akusativ) home --> zu Hause (Note: we know it is neutral, because: das zu Hause) red --> rot --> rotes (because 4th case neutral) red home --> rotes zu Hause my red Home --> mein rotes zu Hause
I seek my red home. --> Ich suche mein rotes zu Hause.
Now that was built on German grammar rules. We certainly could do this with French and Spanish as well.
We probably need some language experts which can tell us, which words are more important in defining the grammar than others. While I was doing the example, I noticed that nouns are more important than adjectives. Furthermore, we should consult a graph theorist who can help us with creating trees, based on these rules. He might be able to use some math to optimize the algorithm.
As I said, these are just some ideas. Any comments?
Regards, Stephan -- Stephan Richter - (901) 573-3308 - srichter@cbu.edu CBU - Physics & Chemistry; Framework Web - Web Design & Development PGP Key: 735E C61E 5C64 F430 4F9C 798E DCA2 07E3 E42B 5391
participants (2)
-
Alexandre Ratti -
Stephan Richter