[Zope] Catalogin unicode content in PropertyObjects ?
Heimo Laukkanen
huima@fountainpark.org
Thu, 02 Jan 2003 14:03:22 +0200
Toby Dickenson wrote:
> All except TextIndex should 'just work' with unicode. Note that you cant mix
> unicode and non-ascii plain strings in the same index.
Works and works, well here I describe the problem little more in detail.
I have created a Catalog with ZCTextIndex and Lexicon, and indexed some
documents into ZCatalog. Documents are PropertyObjects that have some
properties as unicode strings.
When I look into the Lexicon - I see that all the words are stripped
from the unicode characters ( which were these familiar scandinavian
characters - äöÄÖ - that are normally in latin-1 ).
I've created standard search and report interfaces ( Pagetemplates ) and
tried the searches, which seem to work - however these ä,Ä,ö and Ö
characters have been thought as a separate common words.
Ie. if I try to search for ö - I will get an error message:
Error Value: Query contains only common words: '\xf6'
Or an example with real word that is in the content: lähiviikot
I will not find it with:
lähi*
*viikot
but for example with:
hiviikot
l hiviikot
lähiviikot
.. For some cases the search looks like it would work, since content
with those words is found. ,-)
Locale on my Zope is set to: fi_FI@EURO.ISO-8859-1
Any ideas on how to progress on this?
-huima