[ZCM] [ZC] 227/ 8 Comment "TextIndex: Can't index unicode strings"

Collector: Zope Bugs and Patches ... zope-coders@zope.org
Mon, 18 Feb 2002 05:50:17 -0500


Issue #227 Update (Comment) "TextIndex: Can't index unicode strings"
 Status Accepted, Zope/bug medium
To followup, visit:
  http://collector.zope.org/Zope/227

==============================================================
= Comment - Entry #8 by snej on Feb 18, 2002 5:50 am

Removing str() in TextIndex.py lines 285,287 enables
indexing UnicodeStrings.

But then indexing anything else than strings or unicodeStrings
(eg Integers) breaks:

Error Type: TypeError
Error Value: first argument is neither string nor unicode.
 File /home/jens/work/tests/ez/Server/../Zope/lib/python/Products/PluginIndexes/TextIndex/TextIndex.py, line 312, in index_object
    (Object: PrincipiaSearchSource)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Products/PluginIndexes/TextIndex/Lexicon.py, line 161, in Splitter
TypeError: (see above)

I think a test for UnicodeString type is necessary:

--- TextIndex_orig.py   Mon Feb 18 11:31:24 2002
+++ TextIndex.py        Mon Feb 18 11:44:26 2002
@@ -282,9 +282,9 @@
         try:
             source = getattr(obj, self.id)
             if callable(source):
-                source = str(source())
-            else:
-                source = str(source)
+                source = source()
+            if type(source) != type(u''):
+                source=str(source)
         except (AttributeError, TypeError):
             return 0

seems to work for UnicodeStrings, Strings and everything
str()able.



________________________________________
= Accept - Entry #7 by ajung on Feb 17, 2002 9:26 pm

 Status: Pending => Accepted

 Supporters added: ajung

I see the problem. It is caused by the str() calls.
Can you try to remove the str() call in line 285 of 
TextIndex.py ?

-aj 
________________________________________
= Comment - Entry #6 by snej on Feb 17, 2002 8:55 pm

I created a folder, containing 
  a catalog with 
    a Vocabulary with 
      a unicode splitter.

And a Script that returns some Unicode string. 
(u'\xfe \x2031 Huhu') named PrincipiaSearchSource

Indexing everything in that folder returns:

Error Type: UnicodeError
Error Value: ASCII encoding error: ordinal not in range(128)

Traceback (innermost last):
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/ZPublisher/Publish.py, line 150, in publish_module
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/ZPublisher/Publish.py, line 114, in publish
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Zope/__init__.py, line 158, in zpublisher_exception_hook
    (Object: ztest1)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/ZPublisher/Publish.py, line 98, in publish
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/ZPublisher/mapply.py, line 88, in mapply
    (Object: manage_catalogFoundItems)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/ZPublisher/Publish.py, line 39, in call_object
    (Object: manage_catalogFoundItems)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Products/ZCatalog/ZCatalog.py, line 330, in manage_catalogFoundItems
    (Object: ztest1)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Products/ZCatalog/ZCatalog.py, line 697, in ZopeFindAndApply
    (Object: ztest1)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Products/ZCatalog/ZCatalog.py, line 480, in catalog_object
    (Object: ztest1)
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Products/ZCatalog/Catalog.py, line 367, in catalogObject
  File /home/jens/work/tests/edbzope/Zope-2.5.0-src/lib/python/Products/PluginIndexes/TextIndex/TextIndex.py, line 285, in index_object
    (Object: PrincipiaSearchSource)
UnicodeError: (see above)




I know that it is possible to work around using _encoding
and UTF-8, but I would prefer to pass my unicode without
encoding and decoding.







________________________________________
= Comment - Entry #5 by ajung on Feb 17, 2002 6:57 pm

Please provide the traceback !
________________________________________
= Comment - Entry #4 by snej on Feb 17, 2002 5:01 pm

The UnicodeSplitter should be able to index UnicodeStrings, though? The workaround you describe works, as described below.

________________________________________
= Comment - Entry #3 by ajung on Feb 17, 2002 3:14 pm

Are you using the UnicodeSplitter ? If you have different encoding
than ASCII either change the default encoding in site.py or
set <index>_encoding to the encoding of the document.


________________________________________
= Comment - Entry #2 by snej on Feb 17, 2002 2:59 pm


Uploaded:  "patsch"
 - http://collector.zope.org/Zope/227/patsch/view
A test for tests/testTextIndex.py



________________________________________
= Request - Entry #1 by snej on Feb 17, 2002 2:40 pm

index_object() of TextIndex.py raises a 
UnicodeError: ASCII encoding error: ordinal not in range(128)
for strings containing actually non-ASCII unicode,
because it applies str() on all input.

Workaround: Use xxx_encoding to pass unicode 
in an encoding into the TextIndex.




==============================================================