ISO-Splitter again: German Umlaute
hello, i've tried the ISO-Splitter of Zope 2.4.3 for indexing german texts. but the splitter breaks up words with german 'Umlaute' like 'ä','ö','ü'... example: -Produktivitätstheorie would be spitted in: -Produktivit -tstheorie that's really bad. what can i do? is there another splitter for german texts? some kind of config-magic? thank you maik.
There has been a bug that ignored the splitter setting when globbing is enabled and always used the default splitter (altough it _looks_ like you selected a different one). This is fixed in cvs: http://cvs.zope.org/Products/PluginIndexes/TextIndex/GlobbingLexicon.py?only _with_tag=Zope-2_4-branch Wolfram ----- Original Message ----- From: "Maik Jablonski" <maik.jablonski@uni-bielefeld.de> To: <zope@zope.org> Sent: Wednesday, January 09, 2002 10:31 AM Subject: [Zope] ISO-Splitter again: German Umlaute
hello,
i've tried the ISO-Splitter of Zope 2.4.3 for indexing german texts. but the splitter breaks up words with german 'Umlaute' like 'ä','ö','ü'...
example:
-Produktivitätstheorie
would be spitted in:
-Produktivit -tstheorie
that's really bad. what can i do? is there another splitter for german texts? some kind of config-magic?
thank you
maik.
Hi! Is there anything like a complete how-to on how to get ZCatalog working correctly with non-US characters? If not, that would be a GOOD THING (TM). The current standard install in 2.4.x and 2.5 beta does NOT work correctly out-of-the-box with German. I'd except that the standard install should at least work fine with all languages that use the common splitting characters (" ", ",", ";" etc.). Joachim ----- Original Message ----- From: "Maik Jablonski" <maik.jablonski@uni-bielefeld.de> To: <zope@zope.org> Sent: Wednesday, January 09, 2002 10:31 AM Subject: [Zope] ISO-Splitter again: German Umlaute hello, i've tried the ISO-Splitter of Zope 2.4.3 for indexing german texts. but the splitter breaks up words with german 'Umlaute' like 'ä','ö','ü'... example: -Produktivitätstheorie would be spitted in: -Produktivit -tstheorie that's really bad. what can i do? is there another splitter for german texts? some kind of config-magic? thank you maik. _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
The ZopeSplitter or ISO_8859_1 should be aware of your locale settings. (means check the ZopeSplitter functionality with correct locale settings). The splitting characters are somewhat hardcoded in *all* splitters. I am currently working on some Splitter enhancements (e.g. allowing to numbers or single characters). These changes will likely go into the CVS. However making these changing visible to the vocabulary (ZMI) requires some more work and any volunteers are welcome to support the work. Andreas ----- Original Message ----- From: "Joachim Werner" <joe@iuveno-net.de> To: <zope@zope.org> Sent: Wednesday, January 09, 2002 07:26 Subject: Re: [Zope] ISO-Splitter again: German Umlaute Hi! Is there anything like a complete how-to on how to get ZCatalog working correctly with non-US characters? If not, that would be a GOOD THING (TM). The current standard install in 2.4.x and 2.5 beta does NOT work correctly out-of-the-box with German. I'd except that the standard install should at least work fine with all languages that use the common splitting characters (" ", ",", ";" etc.).
Sorry, I haven't had the time to look into the Splitter code yet. So that's why I am asking again: I don't get the concept of having to specifiy the locale at Zope startup for the catalog to work properly. What happens if I WANT en_US locale settings in general, but the catalog should be able to handle French, German, or Spanish words? How can I build multi-lingual Zope systems with that concept? Shouldn't the catalog always split words correctly? I am not talking about languages like Japanese that have a different concept of splitting. Those need a different splitter code of course. But is there ANY reason why German Umlauts or other language-specific special characters are supposed to be splitting characters, other than that the programmers of the original splitter code might have taken the easy way of making all characters that are not A-Z splitting characters? Cheers Joachim ----- Original Message ----- From: "Andreas Jung" <andreas@zope.com> To: "Joachim Werner" <joe@iuveno-net.de> Cc: <zope@zope.org> Sent: Wednesday, January 09, 2002 2:33 PM Subject: Re: [Zope] ISO-Splitter again: German Umlaute
The ZopeSplitter or ISO_8859_1 should be aware of your locale settings. (means check the ZopeSplitter functionality with correct locale settings). The splitting characters are somewhat hardcoded in *all* splitters. I am currently working on some Splitter enhancements (e.g. allowing to numbers or single characters). These changes will likely go into the CVS. However making these changing visible to the vocabulary (ZMI) requires some more work and any volunteers are welcome to support the work.
----- Original Message ----- From: "Joachim Werner" <joe@iuveno-net.de> To: "Andreas Jung" <andreas@zope.com> Cc: <zope@zope.org> Sent: Wednesday, January 09, 2002 13:16 Subject: Re: [Zope] ISO-Splitter again: German Umlaute
Sorry, I haven't had the time to look into the Splitter code yet. So that's why I am asking again:
I don't get the concept of having to specifiy the locale at Zope startup for the catalog to work properly. What happens if I WANT en_US locale settings in general, but the catalog should be able to handle French, German, or Spanish words? How can I build multi-lingual Zope systems with that concept?
ZopeSplitter + locale settings should fit your needs for all western european languages - they are all ISO-8859-1. ISO-8859-1 splitter should fulfill your needs without change your locales.
Shouldn't the catalog always split words correctly? I am not talking about languages like Japanese that have a different concept of splitting. Those need a different splitter code of course. But is there ANY reason why
German
Umlauts or other language-specific special characters are supposed to be splitting characters, other than that the programmers of the original splitter code might have taken the easy way of making all characters that are not A-Z splitting characters?
A splitter is currently bound to a vocabulary. This means you can not change the splitter during indexing. For a multilingual environment you should use Unicode and use the new UnicodeSplitter. Andreas
Sorry, I haven't had the time to look into the Splitter code yet. So that's why I am asking again: I don't get the concept of having to specifiy the locale at Zope startup for the catalog to work properly. What happens if I WANT en_US locale settings in general, but the catalog should be able to handle French, German, or Spanish words? How can I build multi-lingual Zope systems with that concept? Shouldn't the catalog always split words correctly? I am not talking about languages like Japanese that have a different concept of splitting. Those need a different splitter code of course. But is there ANY reason why German Umlauts or other language-specific special characters are supposed to be splitting characters, other than that the programmers of the original splitter code might have taken the easy way of making all characters that are not A-Z splitting characters? Cheers Joachim ----- Original Message ----- From: "Andreas Jung" <andreas@zope.com> To: "Joachim Werner" <joe@iuveno-net.de> Cc: <zope@zope.org> Sent: Wednesday, January 09, 2002 2:33 PM Subject: Re: [Zope] ISO-Splitter again: German Umlaute
The ZopeSplitter or ISO_8859_1 should be aware of your locale settings. (means check the ZopeSplitter functionality with correct locale settings). The splitting characters are somewhat hardcoded in *all* splitters. I am currently working on some Splitter enhancements (e.g. allowing to numbers or single characters). These changes will likely go into the CVS. However making these changing visible to the vocabulary (ZMI) requires some more work and any volunteers are welcome to support the work.
On Wed, Jan 09, 2002 at 07:17:09PM +0100, Joachim Werner wrote:
Sorry, I haven't had the time to look into the Splitter code yet. So that's why I am asking again:
I don't get the concept of having to specifiy the locale at Zope startup for the catalog to work properly. What happens if I WANT en_US locale settings in general, but the catalog should be able to handle French, German, or Spanish words? How can I build multi-lingual Zope systems with that concept?
You cannot. Zope uses locale settings, and locale is process-wide not-thread-safe thingie. Our only hope for multilingual databases and systems is Unicode. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Wednesday 09 January 2002 10:22 am, Oleg Broytmann wrote:
You cannot. Zope uses locale settings, and locale is process-wide not-thread-safe thingie. Our only hope for multilingual databases and systems is Unicode.
No, You probably can. AFAIK Franch, German and Spanish is ISO8859-1 so en_US is suitable, if I not missed something. But in anyway, UTF8 is much better solution, especially if Iuveno wants develop an Chineese portal. :-D -- Bogdan M.Maryniuck
participants (6)
-
Andreas Jung -
Bogdan M.Maryniuck -
Joachim Werner -
Maik Jablonski -
Oleg Broytmann -
Wolfram Kerber