Zope 2: Unicode object ids and PropertyManager properties
Hi, Is there a concensus on the best way in zope 2 to handle non-ascii object IDs? The current restrictions are based on a very old, gradually updated regex which still isn't right, see for example https://bugs.launchpad.net/zope2/+bug/143616 And there is a lot of code sprinkled around which assumes ascii, strings, for example in OFS.ObjectManager and OFS.CopySupport there are a couple like this: def manage_cutObjects(self, ids=None, REQUEST=None): ... if type(ids) is type(''): ids=[ids] I was about to start changing such lines to use isinstance(ids, basestring) but I thought better of it. URLs need to be hex-escaped into ASCII anyway (and there's no standard I'm aware of regarding what encoding to use before escaping; for example, chinese wikipedia appears to encode their URLs as iso8859-15 before escaping, for some bizarre reason. Either that, or some encoding I can't find which has a lot of overlap with it). So maybe we should store IDs hex-escaped; in that case, ascii is a feature, not a bug. Even if that's wrong, the ascii assumption seems to be so widespread in Zope 2 that I think, short of a full audit and a comprehensive plan, gradually using isinstance(foo, basestring) might just be false advertising that leads people into trouble. Thoughts? While looking at this, I also happened to look at OFS.PropertyManager, which doesn't explicitly handle unicode for 'string' properties - it calls ZPublisher.Converters.field2string, which encodes unicode strings into default_encoding *on the way in*. That seems bad to me: if the default encoding configuration ever changes, you'd have to write a potentially huge migration to avoid being left with stored properties in a horrid mixture of encodings. Are we stuck with that forever? Why can't we store unicode string properties as native python unicode and encode on the way out? -- Paul Winkler http://www.slinkp.com
Paul Winkler wrote at 2008-9-12 09:04 -0400:
Is there a concensus on the best way in zope 2 to handle non-ascii object IDs? The current restrictions are based on a very old, gradually updated regex which still isn't right, see for example https://bugs.launchpad.net/zope2/+bug/143616
There is a branch which gets rid of the restrictions "http://svn.zope.org/Zope/branches/dm-arbitrary-ids/". It awaits "wetting" (special wish of Tres Seaver).
... Even if that's wrong, the ascii assumption seems to be so widespread in Zope 2 that I think, short of a full audit and a comprehensive plan, gradually using isinstance(foo, basestring) might just be false advertising that leads people into trouble.
Our local Zope got rid of the ASCII restriction several years ago, mainly to support WebDAV with the same naming conventions as typical file systems (i.e. including special characters (umlauts) in names). We have met only one problem: MS-WebDAV usually does not change the encoding -- but some WebDAV operations ("rename", "copy") follow the recommendation of HTML 4.01 to first "utf-8" encode and then url encode. Thus, we had to cope with both a "native" encoding and an utf-8 encoding. -- Dieter
participants (2)
-
Dieter Maurer -
Paul Winkler