I post this for the mail archives. Everything is pretty obvious, but I think it can save some headaches to others. ABSTRACT: I hit some unicode related problems when testing a unicode aware, multilingual xml repository Zope product. What follows is what I learned and what is working (which now is, luckily enough, almost everything). T The following has been tested on Zope 2.6.1/Py2.1.3 installed from binaries on win32 (but it should be platform independent). THE QUICK ROUTE 1. have a sys.setdefaultconfig in sitecustomize.py 2. use RESPONSE.setHeader('content-type','text/html;charset=<dtml-var your_preferred_encoding>')"> in _both_ your user _and_ ZMI pages 3. let python/Zope do the encoding/decoding for you (i.e. don't use .encode(your_preferred_encoding), unless you know what you are doing) 4. You don't have to start Zope with any locale to have ZCTextIndex work nicely with unicode content. FOR THE BRAVE CURIOUSES **1. sys.setdefaultconfig** <quote from="Toby Dickenson"> <original msg from="Giuseppe Bonelli"> I have utf-8 as sys.defaultencoding and I do not load any locale when starting Zope. </original mg> That is old advice that predates Zope 2.6. It was never a particularly good idea, because it affects all of pythons internals. You only need to encode your unicode as utf-8 (or other encoding) before sending it over the network, and ZPublisher is capable of doing that itself if you tell it the encoding in the header. </quote> That's true, but you will definitely need to set a default encoding if you are going to use python code. If not set, the default encoding is ascii and you will get the usual "encode error, ordinal not in range (128)" error when doing as a simple thing as Print string_with_some_special_chars_inside. To set a default encoding: create (or edit) a sitecustomize.py file in your zope_install_dir\bin\lib (or in the phyton used by zope) and use: import sys sys.setdefaultencoding(my_encoding) ***2. content-type This is trivial for your user interface pages: just add <dtml-call "RESPONSE.setHeader('content-type','text/html;charset=<dtml-var your_preferred_encoding>')"> in the <head/> of your standard_html_header. I found it non trivial for the ZMI pages, as I discovered that the default encoding in the ZMI pages is governed by a variable named management_page_charset, which has a default of iso-8859-1 (and I was using utf-8 for automatically generated titles properties ...). If you need to change this default you can use a property named management_page_charset in the top folder of your app. This works, but is not future proof (see manage_page_header source under lib/python/app/dtml for details on this). The best option would probably be to use a <dtml-call "REQUEST.set('management_page_charset','your_preferred-encoding')">. Why REQUEST.set and not just use a meta "http_equiv=content-type" ? <quote from="Tino Wildenhain"> <original msg from="Dieter Maurer"> I never understood why the meta "http_equiv=content-type" did not work, just recognized that it did not work reliably. </original> This influencing of HTTP-headers via HTML is very problematic because 1) there are often real HTTP-headers, there seems to be no definition which takes precedence over the other 2) Downstream proxys cannot read HTML embedded HTTP-header, but base their caching strategy on the real headers. This will sometimes lead to confusing experiences In general, if you have control over the real HTTP headers, you should use it and not include something like that in HTML. With zope we are in the happy position to have control as opposite to a "web-business-card" where you just dump a couple of HTML files onto a hosters server. A patched ZPT could transport information from HTML meta to REQUEST... interesting idea. </quote> ***3. ZCTextIndex My original ZCTextIndexes problems were due to a combination of above and to leftover words from indexes removed during testing (Heisenberg Uncertainty Principle applied to s/w at play here: during testing, if you change something the testing itself is influencing the system). If you still experience problems, delete the lexicon, recreate it and reindex. If problems persist, double check that you are not mixing unicode/non unicode content in your indexes (if you followed quick route #3 above, this should not be the case). (H)ACKNOWLEDGMENTS Thanks to all who helped (Dieter, Tino, Toby, Hannu, Hugo). END NOTE As always, a debugging session is not fun, but you end up with some new python/Zope insights. __peppo