[Zope-dev] ZCatalog with UTF-8 Chinese
Sin Hang Kin
kentsin@poboxes.com
Thu, 28 Sep 2000 08:08:41 +0800
Dear Developer:
Trying to short-cut UNTEXTINDEX to handle UTF-8 Chinese, I need some help.
After reading some code of query, I think the regular expression operations
which in parse, quotes and parse2 were not safe for utf8 string. So, I
decide to emulate what they do. However, I do not understand what getlexicon
is doing and I would like to learn what q should looks like before it is
passed to evaluate. I do not understand that vocabulary seems to store like
integer, is getlexicon a step to look up the string to convert them to
integer? I am getting lost.
Could some experienced developer help me out of these?
Rgs,
Kent Sin
---------------------------------
kentsin.weblogs.com
kentsin.imeme.net
def query(self, s, default_operator = Or, ws = (string.whitespace,)):
"""
This is called by TextIndexes. A 'query term' which is a string
's' is passed in, along with an index object. s is parsed, then
the wildcards are parsed, then something is parsed again, then the
whole thing is 'evaluated'
"""
# First replace any occurences of " and not " with " andnot "
s = ts_regex.gsub('[%s]+and[%s]*not[%s]+' % (ws * 3), ' andnot ', s)
# do some parsing
q = parse(s)
## here, we give lexicons a chance to transform the query.
## For example, substitute wildcards, or translate words into
## various languages.
q = self.getLexicon(self._lexicon).query_hook(q)
# do some more parsing
q = parse2(q, default_operator)
## evalute the final 'expression'
return self.evaluate(q)