We're seeing problems in one application here due to the catalog and interactions with Unicode. Here's what happens: - an object is indexed with a Unicode title, so in the catalog the metadata tuple has for instance (u'cafe',) - later that title is changed to latin-1, so the new metadata tuple would be ('caf\xe9',) The problem is that Catalog.py has in updateMetadata() the code: if data.get(index, 0) != newDataRecord: data[index] = newDataRecord The simple comparison in the first line provokes a UnicodeDecodeError, you can reproduce by a simple: python -c "u'e' == '\xe9'" This understandable, but in the case of the catalog really not helpful. I propose to change the code above to: try: changed = data.get(index, 0) != newDataRecord except UnicodeDecodeError: changed = True if changed: data[index] = newDataRecord Objections ? Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Florent Guillaume wrote at 2005-8-9 17:18 +0200:
We're seeing problems in one application here due to the catalog and interactions with Unicode. Here's what happens:
- an object is indexed with a Unicode title, so in the catalog the metadata tuple has for instance (u'cafe',) - later that title is changed to latin-1, so the new metadata tuple would be ('caf\xe9',)
The problem is that Catalog.py has in updateMetadata() the code:
if data.get(index, 0) != newDataRecord: data[index] = newDataRecord try: changed = data.get(index, 0) != newDataRecord except UnicodeDecodeError: changed = True if changed: data[index] = newDataRecord
Objections ?
I fear, you will get similar problems in the indexes. You should avoid mixed unicode/non-unicode in fields or indexes (or the the "default encoding" appropriately). -- Dieter
Dieter Maurer wrote:
Florent Guillaume wrote at 2005-8-9 17:18 +0200:
We're seeing problems in one application here due to the catalog and interactions with Unicode. Here's what happens:
- an object is indexed with a Unicode title, so in the catalog the metadata tuple has for instance (u'cafe',) - later that title is changed to latin-1, so the new metadata tuple would be ('caf\xe9',)
The problem is that Catalog.py has in updateMetadata() the code:
if data.get(index, 0) != newDataRecord: data[index] = newDataRecord try: changed = data.get(index, 0) != newDataRecord except UnicodeDecodeError: changed = True if changed: data[index] = newDataRecord
Objections ?
I fear, you will get similar problems in the indexes.
You should avoid mixed unicode/non-unicode in fields or indexes (or the the "default encoding" appropriately).
For indexes I agree, and indeed my example of Title was not ideal. But metadata fields can have nothing to do with indexes... Suppose you're migrating your code from using utf-8 encoded str to unicode. You have no way to recatalog the thing, it will blow in updateMetadata... Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
participants (2)
-
Dieter Maurer -
Florent Guillaume