HI, I have a project which need search with Chinese. I think I can make Zcatalog to search Chinese in utf8. So I change Voodoo Kludge Splitter.py to convert the input string to unicode (assume it is utf8) and make a version of split of my own (see the attached splitter.py). I borrow (stolen) from Interscript the utf8 encoding conversion scheme. I separate the chinese with space by hand hoping the Zcatalog will work. After changing these, I have a catalog which looks good : I can see from the volucably the chinese were actually there (except with some which have html encode like < inside the utf8. I generate the search interface, and test it. However, the search of the index terms return nothings. I search most entries found in the vocubalury but none works, those work will return many unwanted results also. What is causing this failure? What I can do to go further? Rgs, Kent Sin --------------------------------- kentsin.weblogs.com kentsin.imeme.net
On Mon, 25 Sep 2000, Sin Hang Kin wrote:
I generate the search interface, and test it. However, the search of the index terms return nothings. I search most entries found in the vocubalury but none works, those work will return many unwanted results also.
What is causing this failure? What I can do to go further?
It is possible you are having an issue with the way the splitter is used on the search term input side. Several of us have found bugs in that area. We've fixed the ones we've found, but there may be more <wry grin>. Run zope in debug mode ("the debugger is your friend" howto), and watch what UnTextIndex does with the search terms. (Hint: instead of trying to set breakpoints per the howto, just uncomment the appropriate calls to the debugger in the UnTextIndex or Lexicon source file...) --RDM
After reading the source, I realize it is not bug but relate to zcatalog's design. I believe that Zcatalog parse the input string for expressions, however, it take the string as byte-string without convert it from utf-8. What I think is that the parse process break the search expression so that the search fail. What I consider to do is to by-pass parse and parse2 in UNTEXTINDEX.PY of query. The other way is to convert the input into unicode (or at least parse-safe) but it seems a big trouble to include the unicode code here. Can you give some comments on by passing parse(2) ? Will this work? I am not very sure about my choice. Rgs, Kent Sin ----- Original Message ----- From: "Zope mailing lists" <bitz@caller.bitdance.com> To: "Sin Hang Kin" <kentsin@poboxes.com> Cc: <zope-dev@zope.org> Sent: Monday, September 25, 2000 11:59 PM Subject: Re: [Zope-dev] ZCatalog : UTF-8 Chinese
On Mon, 25 Sep 2000, Sin Hang Kin wrote:
I generate the search interface, and test it. However, the search of the index terms return nothings. I search most entries found in the vocubalury but none works, those work will return many unwanted results also.
What is causing this failure? What I can do to go further?
It is possible you are having an issue with the way the splitter is used on the search term input side. Several of us have found bugs in that area. We've fixed the ones we've found, but there may be more <wry grin>.
Run zope in debug mode ("the debugger is your friend" howto), and watch what UnTextIndex does with the search terms. (Hint: instead of trying to set breakpoints per the howto, just uncomment the appropriate calls to the debugger in the UnTextIndex or Lexicon source file...)
--RDM
participants (2)
-
Sin Hang Kin -
Zope mailing lists