[Zope-CMF] Re: [RFC] [Patch] GenericSetup and encodings
Florent Guillaume
fg at nuxeo.com
Wed Jun 7 08:55:33 EDT 2006
yuppie wrote:
> Hi Yves!
>
>
> Yves Bastide wrote:
>> GenericSetup has problems handling non-ASCII data.
>
> 1.) GenericSetup explicitly doesn't support non-UTF-8 XML in profiles.
> UTF-8 is the default encoding for XML and I can't see a need to support
> other XML encodings.
>
> 2.) GenericSetup explicitly doesn't support non-UTF-8 site settings. If
> someone provides a good patch this feature can be added.
>
> 3.) GenericSetup is not tested with non-ASCII UTF-8 site settings. AFAIK
> import works, but not export. I consider this a bug.
>
>> It treats strings sometimes as ASCII, sometimes as UTF-8, yet it has
>> access to two variables: its own ISetupContext.getEncoding() (whose
>> use I didn't fully grok) and CMF's
>> ISetupContext.getSite().getProperty('default_charset').
>
> Sorry, but your assumptions are wrong:
>
> - The default setup tool creates export contexts without specifying the
> encoding, so ISetupContext.getEncoding() returns always None. And even
> if it would be set it represents the encoding of the exported files, not
> the site encoding.
>
> - getSite().getProperty('default_charset') is CMF specific and should
> not be used in GenericSetup.
>
> - The adapters adapt ISetupEnviron, not ISetupContext. getEncoding() and
> getSite() are not always available.
>
>> Attached is a patch using both of them and somewhat working in my
>> setup. Can knowledgeable people comment on it before I enter a
>> collector issue? (I'm using GS alongside with CPS, which also needs
>> some patching; yet basic things, such as exporting-importing an
>> iso8859-15 Title in a CMF charset-default'ed to iso8859-15, should work)
>
> First of all we need unit tests that make sure UTF-8 works and I think
> this should be the default used by GenericSetup. Code that needs to know
> how to find the site encoding can't be generic.
>
> There is an additional problem: If tools use the default property edit
> page from OFS the properties might have a different encoding than
> 'default_charset' of the site. Since the default
> 'management_page_charset' is UTF-8 we have less trouble if we allow only
> UTF-8.
Let's not forget also that the goal in CMF 2 (I think) is to have all
content be unicode strings, never encoded ones. In that case GenericSetup
only has to deal with the XML file's encoding (always UTF-8 anyway) but
that's all.
CPS 3 was a pure-latin1 application for various historical reasons, so we
modified a number of I/O adapters so that they encode/decode properly what
GenericSetup works with. CPS 3.4 tends to remove the hardcoding of latin-1
to the site's default_charset property, but that's not been taken into
account everywhere, although it should.
CPS 4 will be purely unicode, and won't need all that shit.
Florent
--
Florent Guillaume, Nuxeo (Paris, France) Director of R&D
+33 1 40 33 71 59 http://nuxeo.com fg at nuxeo.com
More information about the Zope-CMF
mailing list