Searching Catalog for Unicode Values of FieldIndex
I am disappointed to find out that FieldIndex does not support Unicode values. While searching a catalog for a FieldIndex which has a value out of ASCII range, one would encounter the following error: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128) The interesting point is that the object has been indexed in the catalog with the proper Unicode value but you can not search the index with this value. I wonder why a very old message at: http://mail.zope.org/pipermail/zope/2001-November/104636.html suggests that FieldIndex supports Unicode. I could also find a message reporting the same problem (*with full traceback*) on the list. Andreas Jung propose to change the encoding of Python to solve it: http://mail.zope.org/pipermail/zope/2003-February/131252.html While changing the default encoding can suppress the exception, the search results are empty making the catalog useless. (I changed it to iso-8859-2). Anyway I am not sure, I like to change the encoding of the python just to get the desired value for FieldIndex. I wonder if there is any chance to use FieldIndex with Unicode values. Perhaps there are some other Index products which support Unicode values. Or should one write his own Index if he wants a Unicode aware FieldIndex. Cheers, Mohsen,
Mohsen Moeeni wrote at 2004-8-14 14:19 +0430:
I am disappointed to find out that FieldIndex does not support Unicode values.
A FieldIndex does support Unicode values. However, *all* indexed values must then be either pure ASCII or unicode!
While searching a catalog for a FieldIndex which has a value out of ASCII range, one would encounter the following error:
'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)
Look at the traceback. Always do this when you get errors! Almost surely, your index contains a non ASCII and non unicode value. When the index lookup compares the keys, the non ASCII value is converted into unicode (as there is another unicode operand) and this fails. You can use a "Managable FieldIndex". It allows you to convert all values to unicode with your chosen encoding (specifying the encoding could be easier -- maybe in the next release). <http://www.dieter.handshake.de/pyprojects/zope> -- Dieter
On Sun, 15 Aug 2004 22:35:00 +0200, Dieter Maurer <dieter@handshake.de> wrote:
Mohsen Moeeni wrote at 2004-8-14 14:19 +0430:
While searching a catalog for a FieldIndex which has a value out of ASCII range, one would encounter the following error:
'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)
Look at the traceback. Always do this when you get errors!
I did. For others who might want to see the traceback I wrote an URL to another message on the list, because tracebacks make an email ugly :-)
Almost surely, your index contains a non ASCII and non unicode value. When the index lookup compares the keys, the non ASCII value is converted into unicode (as there is another unicode operand) and this fails.
I must admit, the word ``Unicode`` is not very thoughtful in my email. It's a normal but ``utf-8``-encoded string which contains non-ASCII characters. So you are right.
You can use a "Managable FieldIndex". It allows you to convert all values to unicode with your chosen encoding (specifying the encoding could be easier -- maybe in the next release).
Okay I tried for some hours to make ManagableIndex work for me. Firstly, I read the documentation. I found out I have 3 points to do this conversion. None of them worked: * ``python: value.decode('utf-8')`` as the ValueProvider Normalizer raised no exception and I could find the object in catalog. However the object had no value under the interested index. * ``python: value.decode('utf-8')`` as Term Prenomalizer and Normalizer raised an ``AttributeError`` with value `` 'unicode' object has no attribute 'decode'`` upon creating the object instance. See end of the email for the traceback. I do not know why the Index term is converted to ``Unicode`` when it comes to Term Prenormalization. My code (which is an Archetypes product) does nothing to make this happen. So I wonder what I am missing. I thought Term Prenomalizer would be the ideal place to make the conversion happen. Because, If I understand it right, the conversion will also apply to the terms provided for searching the index (``_apply_index`` method if I am right). Cheers, Mohsen, ------ 2004-08-17T18:40:24 ERROR(200) SiteError http://mysite.com/plone/createObject Traceback (most recent call last): File "/usr/local/zope270/lib/python/ZPublisher/Publish.py", line 100, in publish request, bind=1) File "/usr/local/zope270/lib/python/ZPublisher/mapply.py", line 88, in mapply if debug is not None: return debug(object,args,context) File "/usr/local/zope270/lib/python/ZPublisher/Publish.py", line 40, in call_object result=apply(object,args) # Type s<cr> to step into published object. File "/usr/local/zope270/lib/python/Products/CMFFormController/FSControllerPythonScript.py", line 88, in __call__ result = FSControllerPythonScript.inheritedAttribute('__call__')(self, *args, **kwargs) File "/usr/local/zope270/lib/python/Products/CMFFormController/Script.py", line 141, in __call__ return BaseFSPythonScript.__call__(self, *args, **kw) File "/usr/local/zope270/lib/python/Products/CMFCore/FSPythonScript.py", line 104, in __call__ return Script.__call__(self, *args, **kw) File "/usr/local/zope270/lib/python/Shared/DC/Scripts/Bindings.py", line 306, in __call__ return self._bindAndExec(args, kw, None) File "/usr/local/zope270/lib/python/Shared/DC/Scripts/Bindings.py", line 343, in _bindAndExec return self._exec(bound_data, args, kw) File "/usr/local/zope270/lib/python/Products/CMFCore/FSPythonScript.py", line 160, in _exec result = apply(f, args, kw) File "Script (Python)", line 16, in createObject File "/usr/local/zope270/lib/python/Products/CMFCore/PortalFolder.py", line 363, in invokeFactory , kw File "/usr/local/zope270/lib/python/Products/CMFCore/TypesTool.py", line 709, in constructContent ob = apply(info.constructInstance, (container, id) + args, kw) File "/usr/local/zope270/lib/python/Products/CMFCore/TypesTool.py", line 401, in constructInstance return self._finishConstruction(ob) File "/usr/local/zope270/lib/python/Products/CMFCore/TypesTool.py", line 299, in _finishConstruction ob.reindexObject(idxs=['portal_type', 'Type']) File "/usr/local/zope270/lib/python/Products/Archetypes/CatalogMultiplex.py", line 60, in reindexObject c.catalog_object(self, self.__url(), idxs=lst) File "/usr/local/zope270/lib/python/Products/ZCatalog/ZCatalog.py", line 513, in catalog_object update_metadata=update_metadata) File "/usr/local/zope270/lib/python/Products/ZCatalog/Catalog.py", line 381, in catalogObject blah = x.index_object(index, object, threshold) File "/usr/local/zope270/lib/python/Products/ManagableIndex/ManagableIndex.py", line 203, in index_object val= self._evaluate(obj) File "/usr/local/zope270/lib/python/Products/ManagableIndex/ManagableIndex.py", line 441, in _evaluate return self._standardizeValue(v,object) File "/usr/local/zope270/lib/python/Products/ManagableIndex/ManagableIndex.py", line 445, in _standardizeValue return self._standardizeTerm(value,object,1) File "/usr/local/zope270/lib/python/Products/ManagableIndex/ManagableIndex.py", line 401, in _standardizeTerm value = self._prenormalizeTerm(value, object) File "/usr/local/zope270/lib/python/Products/ManagableIndex/ManagableIndex.py", line 424, in _prenormalizeTerm return normalizer._normalize(value, object) File "/usr/local/zope270/lib/python/Products/ManagableIndex/Evaluation.py", line 130, in _normalize return evaluator._evaluate(value,object) File "/usr/local/zope270/lib/python/Products/ManagableIndex/Evaluation.py", line 86, in _evaluate v= EvalAndCall.inheritedAttribute('_evaluate')(self,value,object) File "/usr/local/zope270/lib/python/Products/ManagableIndex/Evaluation.py", line 58, in _evaluate return expr(context) File "/usr/local/zope270/lib/python/Products/PageTemplates/ZRPythonExpr.py", line 47, in __call__ return eval(code, g, {}) File "Python expression "value.decode('utf-8')"", line 1, in <expression> AttributeError: 'unicode' object has no attribute 'decode'
Mohsen Moeeni wrote at 2004-8-17 19:12 +0430:
...
You can use a "Managable FieldIndex". It allows you to convert all values to unicode with your chosen encoding (specifying the encoding could be easier -- maybe in the next release).
Okay I tried for some hours to make ManagableIndex work for me. Firstly, I read the documentation. I found out I have 3 points to do this conversion. None of them worked:
* ``python: value.decode('utf-8')`` as the ValueProvider Normalizer raised no exception and I could find the object in catalog. However the object had no value under the interested index.
Apparently, you kept the "ignore exceptions" default in the Attribute lookup? This means, exceptions are silently ignored. Uncheck "ignore exceptions" to find out what goes wrong..
* ``python: value.decode('utf-8')`` as Term Prenomalizer and Normalizer raised an ``AttributeError`` with value `` 'unicode' object has no attribute 'decode'``
This is a speaking error message, isn't it? "value" is a unicode object and it does not have a "decode" attribute (what you are looking for is "encode").
I do not know why the Index term is converted to ``Unicode`` when it comes to Term Prenormalization. My code (which is an Archetypes product) does nothing to make this happen. Archetypes ("StringField.set" to be precise) does this.
Keep your indexed values Unicode. I recommend to use as prenormalizer: python: not isinstance(value, unicode) and unicode(value, yourCharset) or value
I thought Term Prenomalizer would be the ideal place to make the conversion happen. Because, If I understand it right, the conversion will also apply to the terms provided for searching the index (``_apply_index`` method if I am right).
You are right. The Prenormalizer is executed before pattern expansion. The next "ManagableIndex" release will provide automatical unicode transformation with an explicit encoding argument. However, it will happen too late for pattern expansion (as it runs after the term normalizer). The "Prenormalizer" is a better place. -- Dieter
Thanks Dieter for your attention. I could get the desired result by adding this as the term prenormalizer:: python: not isinstance(value, unicode) and unicode(value, 'utf-8') or value Dieter, I still get an exception while trying to see the Indexed objects in the catalog (manage_objectInformation). The problem is that ``getEntryForObject`` tries to return ``str`` of a value. But ``str`` of Unicode objects out of ASCII range, raises and excpetion. Full traceback below. Cheers, Mohsen, ------ 2004-08-18T19:21:37 ERROR(200) SiteError http://mysite/plone/at_catalog/manage_objectInformation Traceback (most recent call last): File "/usr/local/zope270/lib/python/ZPublisher/Publish.py", line 100, in publish request, bind=1) File "/usr/local/zope270/lib/python/ZPublisher/mapply.py", line 88, in mapply if debug is not None: return debug(object,args,context) File "/usr/local/zope270/lib/python/ZPublisher/Publish.py", line 40, in call_object result=apply(object,args) # Type s<cr> to step into published object. File "/usr/local/zope270/lib/python/Shared/DC/Scripts/Bindings.py", line 306, in __call__ return self._bindAndExec(args, kw, None) File "/usr/local/zope270/lib/python/Shared/DC/Scripts/Bindings.py", line 343, in _bindAndExec return self._exec(bound_data, args, kw) File "/usr/local/zope270/lib/python/App/special_dtml.py", line 175, in _exec try: result = render_blocks(self._v_blocks, ns) File "/usr/local/zope270/lib/python/DocumentTemplate/DT_In.py", line 626, in renderwob sequence=expr(md) File "/usr/local/zope270/lib/python/DocumentTemplate/DT_Util.py", line 201, in eval return eval(code, d) File "<string>", line 0, in ? File "/usr/local/zope270/lib/python/Products/ZCatalog/ZCatalog.py", line 585, in getIndexDataForRID return self._catalog.getIndexDataForRID(rid) File "/usr/local/zope270/lib/python/Products/ZCatalog/Catalog.py", line 457, in getIndexDataForRID result[name] = self.getIndex(name).getEntryForObject(rid, "") File "/usr/local/zope270/lib/python/Products/ManagableIndex/ManagableIndex.py", line 191, in getEntryForObject return str(info) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Mohsen Moeeni wrote at 2004-8-18 19:54 +0430:
... Dieter, I still get an exception while trying to see the Indexed objects in the catalog (manage_objectInformation). The problem is that ``getEntryForObject`` tries to return ``str`` of a value. But ``str`` of Unicode objects out of ASCII range, raises and excpetion. Full traceback below.
A bug in "ManagableIndex"... I will see that I can fix it for the next release. A workaround woult be to set Python's "default encoding" in "sitecustomize.py", e.g. import sys sys.setdefaultencoding('iso-8859-1') -- Dieter
participants (2)
-
Dieter Maurer -
Mohsen Moeeni