[Zope] Catalogin unicode content in PropertyObjects ?

Heimo Laukkanen huima@fountainpark.org
Thu, 02 Jan 2003 14:03:22 +0200


Toby Dickenson wrote:

> All except TextIndex should 'just work' with unicode. Note that you cant mix 
> unicode and non-ascii plain strings in the same index.

Works and works, well here I describe the problem little more in detail.

I have created a Catalog with ZCTextIndex and Lexicon, and indexed some 
documents into ZCatalog. Documents are PropertyObjects that have some 
properties as unicode strings.

When I look into the Lexicon - I see that all the words are stripped 
from the unicode characters ( which were these familiar scandinavian 
characters - äöÄÖ - that are normally in latin-1 ).

I've created standard search and report interfaces ( Pagetemplates ) and 
tried the searches, which seem to work - however these ä,Ä,ö and Ö 
characters have been thought as a separate common words.

Ie. if I try to search for ö - I will get an error message:
Error Value: Query contains only common words: '\xf6'

Or an example with real word that is in the content: lähiviikot

I will not find it with:
lähi*
*viikot

but for example with:
hiviikot
l hiviikot
lähiviikot

.. For some cases the search looks like it would work, since content 
with those words is found. ,-)

Locale on my Zope is set to: fi_FI@EURO.ISO-8859-1

Any ideas on how to progress on this?

-huima