RE: [Zope] Re: Zope iso-8859-1 to utf-8
I see... And what python function would you use for conversion ? I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine. The generated documents are still sent to browsers as iso-8859-1, and are not broken. So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1 ? Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ? Thanks. Pascal -----Message d'origine----- De : zope-bounces@zope.org [mailto:zope-bounces@zope.org]De la part de Max M Envoyé : mardi 13 septembre 2005 14:51 À : zope@zope.org Objet : [Zope] Re: Zope iso-8859-1 to utf-8 Pascal Peregrina wrote:
Hi,
I have been running a Zope installation for 2 years, so there are now lots of objects, properties, etc...
I would like to know what are the possible issues I may have to face if I change the default encoding for iso-8859-1 to utf-8 in ZMI.
You must write a script that converts any property on any object in your site that is latin-1 to utf-8. So first find all objects you use. See what types they are. Find all text and string attributes on those opjects. Write a function that converts from latin to utf and run that on every object. The hard part will be finding all the attributes, but perhaps you can write a method that can help find those properties for you using introspection. -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev ) ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com **********************************************************************
Pascal Peregrina wrote at 2005-9-13 14:21 +0100:
I see... And what python function would you use for conversion ?
unicode(iso_string, 'iso-8859-1').encode('utf-8')
I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine.
Strange. I had expected that non-ASCII characters were displayed in a wrong way.
The generated documents are still sent to browsers as iso-8859-1, and are not broken.
If you switched to "utf-8", then *you* should ensure that they are sent as "utf-8".
So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1
This is a strange question... The problem does not lie with the characters but with their codes. The code agrees between UTF-8 and iso-8859-1 for precisely the ASCII characters (unicode chars 0-127). Unicode characters 128-255 use 2 bytes in UTF-8 but 1 in "iso-8859-1". Unicode characters 256 and up can be represented encoded in "UTF-8" but not "iso-8859-1".
... Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ?
"ustring" is a unicode string: stored inside Zope as unicode, sent to the browser UTF-8 encoded and expected to come back UTF-8 encoded. "string" is a plain (non unicode) string. It should use the encoding of your page (UTF-8, once you switched to UTF-8). -- Dieter
Also, watcha out for the gotcha in BaseResponse.py, which can end up doing a default encoding to latin-1 in some circumstances. I really want to make that hard coded thing configurable in zope.conf at some stage... Chris Pascal Peregrina wrote:
I see... And what python function would you use for conversion ?
I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine. The generated documents are still sent to browsers as iso-8859-1, and are not broken.
So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1 ?
Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ?
Thanks.
Pascal
-----Message d'origine----- De : zope-bounces@zope.org [mailto:zope-bounces@zope.org]De la part de Max M Envoyé : mardi 13 septembre 2005 14:51 À : zope@zope.org Objet : [Zope] Re: Zope iso-8859-1 to utf-8
Pascal Peregrina wrote:
Hi,
I have been running a Zope installation for 2 years, so there are now lots of objects, properties, etc...
I would like to know what are the possible issues I may have to face if I change the default encoding for iso-8859-1 to utf-8 in ZMI.
You must write a script that converts any property on any object in your site that is latin-1 to utf-8.
So first find all objects you use. See what types they are.
Find all text and string attributes on those opjects.
Write a function that converts from latin to utf and run that on every object.
The hard part will be finding all the attributes, but perhaps you can write a method that can help find those properties for you using introspection.
-- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
participants (3)
-
Chris Withers -
Dieter Maurer -
Pascal Peregrina