Catalogin unicode content in PropertyObjects ?
Hi all and happy new year, I'm currently building an application with PropertyObject-product with which I create objects that have unicode properties in their propertysheets. Unicode would not be necessarity since for now the content is in latin-1, but I wanted to have the possibility to translations even to other areas if necessary - plus to try out unicode ,-) However my first problem came with ZCatalog. Is there a way to index and search unicode content at the moment? I tried to use ZCTextIndex with ZCTextIndexLexicon - but noticed that all the unicode characters were stripped out. I did searches on mailinglists but the best I found was that TextIndexNG would support unicode later on... Any help? -huima
On Thursday 02 January 2003 1:59 am, Heimo Laukkanen wrote:
However my first problem came with ZCatalog. Is there a way to index and search unicode content at the moment?
All except TextIndex should 'just work' with unicode. Note that you cant mix unicode and non-ascii plain strings in the same index. -- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
Toby Dickenson wrote:
All except TextIndex should 'just work' with unicode. Note that you cant mix unicode and non-ascii plain strings in the same index.
Works and works, well here I describe the problem little more in detail. I have created a Catalog with ZCTextIndex and Lexicon, and indexed some documents into ZCatalog. Documents are PropertyObjects that have some properties as unicode strings. When I look into the Lexicon - I see that all the words are stripped from the unicode characters ( which were these familiar scandinavian characters - äöÄÖ - that are normally in latin-1 ). I've created standard search and report interfaces ( Pagetemplates ) and tried the searches, which seem to work - however these ä,Ä,ö and Ö characters have been thought as a separate common words. Ie. if I try to search for ö - I will get an error message: Error Value: Query contains only common words: '\xf6' Or an example with real word that is in the content: lähiviikot I will not find it with: lähi* *viikot but for example with: hiviikot l hiviikot lähiviikot .. For some cases the search looks like it would work, since content with those words is found. ,-) Locale on my Zope is set to: fi_FI@EURO.ISO-8859-1 Any ideas on how to progress on this? -huima
Message: 38 To: zope@zope.org From: Maik Jablonski <maik.jablonski@uni-bielefeld.de> Date: Thu, 02 Jan 2003 17:14:16 +0100 Subject: [Zope] Re: Catalogin unicode content in PropertyObjects ?
Please update your Zope2.6. This problem is fixed in the cvs:
Tried, but it did not solve the issues, save thing continued. TextIndexNG worked straight out of the box - so now I know I will get my application searches to work. However I would like to know and get that ZCTextIndex to work too. ,-) I tested also old TextIndex with Vocabulary - selecting ISO-splitter or unicode splitte, which both seemed to index content right. However ISO-splitter worked only if I had globbing disabled. Now the big question is - what could do more to try to fix it? What could cause the problem? Try another locale and see how it behaves? Forexample if you Maik have ZCTextIndex working with umlauts and all, send me info of what locale you are using - and I will test it too. Out of ideas... But TextIndexNG looks - from these first steps - really nice: http://www.zope.org/Members/ajung/TextIndexNG/ -huima
Heimo Laukkanen wrote:
Tried, but it did not solve the issues, save thing continued. Now the big question is - what could do more to try to fix it? What could cause the problem? Try another locale and see how it behaves? Forexample if you Maik have ZCTextIndex working with umlauts and all, send me info of what locale you are using - and I will test it too.
Hi, I use -L de_DE and ZCTextIndex works with "umlaute" without any problems. -mj
Heimo Laukkanen wrote:
Hi all and happy new year,
I'm currently building an application with PropertyObject-product with which I create objects that have unicode properties in their propertysheets. Unicode would not be necessarity since for now the content is in latin-1, but I wanted to have the possibility to translations even to other areas if necessary - plus to try out unicode ,-)
However my first problem came with ZCatalog. Is there a way to index and search unicode content at the moment?
I tried to use ZCTextIndex with ZCTextIndexLexicon - but noticed that all the unicode characters were stripped out.
I did searches on mailinglists but the best I found was that TextIndexNG would support unicode later on... Any help?
Please update your Zope2.6. This problem is fixed in the cvs: http://collector.zope.org/Zope/597 -mj
participants (3)
-
Heimo Laukkanen -
Maik Jablonski -
Toby Dickenson