strange unicode behaviour

Giuseppe Bonelli

23 Jul 2003 23 Jul '03

11:17 p.m.

I have spent the last 30 minutes going crazy with this: in dtml: <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> </head> <dtml-call "getText()")> </html> in python: def getText(): s=u'a string with some accented chars' s=s.encode('utf-8') return s the above works fine, but return s.lower() does not !!! (the accented chars are badly rendered in the browser). Can someone, please, explain this to me?? I am on zope 2.6.1 (installed from binaries under win),

...

From the python console everithing is OK, so there should be something with Zope. I have utf-8 as sys.defaultencoding and I do not load any locale when starting Zope.

Thanks for any help, --peppo

Show replies by date

Hugo Filipe Ramos

23 Jul 23 Jul

11:25 p.m.

New subject: [Zope] strange unicode behaviour

What about using this line inside your <head> block? <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> and setting it to your zone encoding? Regards HR ----- Original Message ----- From: "Giuseppe Bonelli" <giuseppe.bonelli@tiscalinet.it> To: <zope@zope.org> Sent: Thursday, July 24, 2003 12:17 AM Subject: [Zope] strange unicode behaviour

...

I have spent the last 30 minutes going crazy with this:

in dtml: <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> </head> <dtml-call "getText()")> </html>

in python: def getText(): s=u'a string with some accented chars' s=s.encode('utf-8') return s

the above works fine, but return s.lower()

does not !!! (the accented chars are badly rendered in the browser).

Can someone, please, explain this to me??

I am on zope 2.6.1 (installed from binaries under win),

From the python console everithing is OK, so there should be something with Zope. I have utf-8 as sys.defaultencoding and I do not load any locale when starting Zope.

Thanks for any help,

--peppo

_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )

Giuseppe Bonelli

11:36 p.m.

New subject: [Zope] strange unicode behaviour

But why without s.lower() everything is ok ? Thanks for the reply, hugo. --peppo PS: apologize for the double posting of the original message.

...

-----Original Message----- From: Hugo Filipe Ramos [mailto:ml@zopers.org] Sent: giovedì 24 luglio 2003 1.25 To: giuseppe.bonelli@tiscali.it Cc: zope@zope.org Subject: Re: [Zope] strange unicode behaviour

What about using this line inside your <head> block? <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

and setting it to your zone encoding?

Regards

HR

----- Original Message ----- From: "Giuseppe Bonelli" <giuseppe.bonelli@tiscalinet.it> To: <zope@zope.org> Sent: Thursday, July 24, 2003 12:17 AM Subject: [Zope] strange unicode behaviour

...
I have spent the last 30 minutes going crazy with this:

in dtml: <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> </head> <dtml-call "getText()")> </html>

in python: def getText(): s=u'a string with some accented chars' s=s.encode('utf-8') return s

the above works fine, but return s.lower()

does not !!! (the accented chars are badly rendered in the browser).

Can someone, please, explain this to me??

I am on zope 2.6.1 (installed from binaries under win),

From the python console everithing is OK, so there should be something with Zope. I have utf-8 as sys.defaultencoding and I do not load any locale when starting Zope.

Thanks for any help,

--peppo

_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )

Hannu Krosing

24 Jul 24 Jul

6:37 a.m.

New subject: [Zope] strange unicode behaviour

Giuseppe Bonelli kirjutas N, 24.07.2003 kell 02:17:

...

I have spent the last 30 minutes going crazy with this:

in dtml: <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> </head> <dtml-call "getText()")> </html>

in python: def getText(): s=u'a string with some accented chars' s=s.encode('utf-8') return s

the above works fine, but return s.lower()

does not !!! (the accented chars are badly rendered in the browser).

you do s.lower() on a string that is in fact utf-8 and python does not know - for python the result of s.encode(x) is "just a string" try doing def getText(): s=u'a string with some accented chars' s=s.lower() s=s.encode('utf-8') return s i.e. lower() the unicode version of s -------------- Hannu

Toby Dickenson

7:32 a.m.

New subject: [Zope] strange unicode behaviour

On Thursday 24 July 2003 00:17, Giuseppe Bonelli wrote:

...

I have spent the last 30 minutes going crazy with this:

in dtml: <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"> </head> <dtml-call "getText()")> </html>

in python: def getText(): s=u'a string with some accented chars' s=s.encode('utf-8') return s

the above works fine, but return s.lower()

does not !!! (the accented chars are badly rendered in the browser).

Can someone, please, explain this to me??

I am on zope 2.6.1 (installed from binaries under win),

From the python console everithing is OK, so there should be something with Zope.

You have had lots of advice about why this effect is happening, but so far noone has recommended the best approach. If you remove the s=s.encode('utf-8') line, then getText will return a unicode string (with or without s.lower()), and your dtml method will also return a unicode string. Add to your dtml: <dtml-call "RESPONSE.setHeader('content-type','text/html;charset=utf-8')"> and ZPublisher will automatically encode the response as unicode before sending it over http. The advantage of this approach is that your application code can work entirely in unicode.

...

I have utf-8 as sys.defaultencoding and I do not load any locale when starting Zope.

That is old advice that predates Zope 2.6. It was never a particularly good idea, because it affects all of pythons internals. You only need to encode your unicode as utf-8 (or other encoding) before sending it over the network, and ZPublisher is capable of doing that itself if you tell it the encoding in the header. -- Toby Dickenson - http://www.geminidataloggers.com/people/tdickenson Want a job like mine? http://www.geminidataloggers.com/jobs for Software Engineering jobs at Gemini Data Loggers in Chichester, West Sussex, England

Giuseppe Bonelli

2:21 p.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Thanks to all who responded to my original post, particularly to Toby who pointed me in the right direction: I was mistakenly (and stupidly ...) using: <meta http-equiv="content-type" content="text/html;charset=&dtml-encoding;"> instead of: <dtml-call "RESPONSE.setHeader('content-type','text/html;charset=utf-8')"> in my standard_html_header, so I was encoding on the browser, but not over http !!! This solved everything, but an issue remains: I started fiddling with encoding, when I wanted to full text index my utf-8 encoded unicode content with ZCTextIndex and the lexicon gave me the usual ordinal not in range decoding error when building the index. Now I have a clean unicode setup (i.e. no locale when starting Zope and no sys.setdefaultendoding when starting python 2.1.3) and the lexicon started again to give me errors, for example when indexing a string containing "isn't" (the errors are generated at line 133 in lexicon.py). I searched the mail list archives and I found references to an old ZCTextIndex bug (597 in the collector), whose resolution seems to require starting zope with a -L option. Now I am a little bit confused and I ask if someone has a firm understanding on the status of Zope find/search support of unicode string containing high chars. Specifically: 1. Does the standard ZCTextIndex coming with Zope 2.6.1 support this ? 2. If yes, do I need to start Zope with a particular locale ? 3. Regarding these issues, is the recently released TextIndexNG ver.2 a better solution ? NB: if this matters, I have utf-8 encoded content in various languages, so I would prefer not to have to use any -L setting when starting Zope as I do not need to support TTW content editing. TIA, --peppo

...

-----Original Message----- From: Toby Dickenson [mailto:tdickenson@geminidataloggers.com] Sent: giovedì 24 luglio 2003 9.33 To: giuseppe.bonelli@tiscali.it; Giuseppe Bonelli; zope@zope.org Subject: Re: [Zope] strange unicode behaviour

[...snip...]

...

...
I have utf-8 as sys.defaultencoding and I do not load any locale when starting Zope.

That is old advice that predates Zope 2.6. It was never a particularly good idea, because it affects all of pythons internals. You only need to encode your unicode as utf-8 (or other encoding) before sending it over the network, and ZPublisher is capable of doing that itself if you tell it the encoding in the header.

-- Toby Dickenson - http://www.geminidataloggers.com/people/tdickenson

Want a job like mine? http://www.geminidataloggers.com/jobs for Software Engineering jobs at Gemini Data Loggers in Chichester, West Sussex, England

Dieter Maurer

25 Jul 25 Jul

12:20 a.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Giuseppe Bonelli wrote at 2003-7-24 16:21 +0200:

...

Thanks to all who responded to my original post, particularly to Toby who pointed me in the right direction: I was mistakenly (and stupidly ...) using:

<meta http-equiv="content-type" content="text/html;charset=&dtml-encoding;"> instead of: <dtml-call "RESPONSE.setHeader('content-type','text/html;charset=utf-8')">

in my standard_html_header, so I was encoding on the browser, but not over http !!!

This solved everything, but an issue remains:

I hit this same problem earlier. I never understood why the meta "http_equiv=content-type" did not work, just recognized that it did not work reliably. Do you know why it does not work?

...

I started fiddling with encoding, when I wanted to full text index my utf-8 encoded unicode content with ZCTextIndex and the lexicon gave me the usual ordinal not in range decoding error when building the index.

I remember that (at least early) ZCTextIndex could not handle Unicode (see the mailing list archives). Andreas' TextIndexNG has been proposed as a fully Unicode aware alternative. Be careful, though: All indexes, independent of type, built upon BTrees. BTrees require that their keys are persistently ordered. This implies usually that they must all have the same type. Mixing Unicode and non-Unicode keys can result in corrupted indexes (less likely) or implicit conversions (more likely) with potential "encoding errors".

...

... Specifically: 1. Does the standard ZCTextIndex coming with Zope 2.6.1 support this ?

I do not know. The "cvs" (--> "cvs.zope.org") could tell you what changes were done to ZCTextIndex since the bug report.

...

2. If yes, do I need to start Zope with a particular locale ?

For "Unicode", no special locale should be necessary. However, this is dependent on the splitter. A splitter might descide that it uses locale information even for unicode strings.

...

3. Regarding these issues, is the recently released TextIndexNG ver.2 a better solution ?

Andreas is very confident that TextIndexNG handles unicode very well. Dieter

Tino Wildenhain

7:50 a.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Dieter Maurer wrote: ...

...

I hit this same problem earlier. I never understood why the meta "http_equiv=content-type" did not work, just recognized that it did not work reliably.

Do you know why it does not work?

This influencing of HTTP-headers via HTML is very problematic because 1) there are often real HTTP-headers, there seems to be no definition which takes precedence over the other 2) Downstream proxys cannot read HTML embedded HTTP-header, but base their caching strategy on the real headers. This will sometimes lead to confusing experiences In general, if you have control over the real HTTP headers, you schould use it and not include something like that in HTML. With zope we are in the happy position to have control as opposite to a "web-business-card" where you just dump a couple of HTML files onto a hosters server. A patched ZPT could transport information from HTML meta to REQUEST... interesting idea. Regards Tino

Chris Withers

31 Jul 31 Jul

2:45 p.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Dieter Maurer wrote:

...

Mixing Unicode and non-Unicode keys can result in corrupted indexes (less likely)

Really? What experiences lead you to believe that?

...

or implicit conversions (more likely) with potential "encoding errors".

can you give some examples of these? cheers, Chris

Toby Dickenson

2:58 p.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

On Thursday 31 July 2003 15:45, Chris Withers wrote:

...

Dieter Maurer wrote:

...
Mixing Unicode and non-Unicode keys can result in corrupted indexes (less likely)

BTrees need their keys to have a total ordering. -- Toby Dickenson - http://www.geminidataloggers.com/people/tdickenson Want a job like mine? http://www.geminidataloggers.com/jobs for Software Engineering jobs at Gemini Data Loggers in Chichester, West Sussex, England

Chris Withers

3:07 p.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Toby Dickenson wrote:

...

On Thursday 31 July 2003 15:45, Chris Withers wrote:

...
Dieter Maurer wrote:

...
Mixing Unicode and non-Unicode keys can result in corrupted indexes (less likely)

BTrees need their keys to have a total ordering.

Why would they not have that if the keys are nto all of the same type? Chris

Toby Dickenson

3:33 p.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

On Thursday 31 July 2003 16:07, Chris Withers wrote:

...

Toby Dickenson wrote:

...
On Thursday 31 July 2003 15:45, Chris Withers wrote:

...
Dieter Maurer wrote:

...
Mixing Unicode and non-Unicode keys can result in corrupted indexes (less likely)

BTrees need their keys to have a total ordering.

Why would they not have that if the keys are nto all of the same type?

BTrees require that they can compare their keys for order. Historically, breaking this requirement could corrupt things severly and silently. Today it isnt silent. Demonstration is here: http://mail.zope.org/pipermail/zodb-dev/2002-February/002301.html -- Toby Dickenson - http://www.geminidataloggers.com/people/tdickenson Want a job like mine? http://www.geminidataloggers.com/jobs for Software Engineering jobs at Gemini Data Loggers in Chichester, West Sussex, England

Chris Withers

4:54 p.m.

New subject: unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Toby Dickenson wrote:

...

...
...
BTrees need their keys to have a total ordering.

Why would they not have that if the keys are nto all of the same type?

BTrees require that they can compare their keys for order. Historically, breaking this requirement could corrupt things severly and silently. Today it isnt silent.

Boo! Hiss! I thought you could always compare object in python and have some kindof meaningful result returned? Chris

8284

Age (days ago)

8292

Last active (days ago)

List overview

12 comments

8 participants

participants (8)

Chris Withers
Dieter Maurer
Giuseppe Bonelli
Giuseppe Bonelli
Hannu Krosing
Hugo Filipe Ramos
Tino Wildenhain
Toby Dickenson