Dear Developer: Trying to short-cut UNTEXTINDEX to handle UTF-8 Chinese, I need some help. After reading some code of query, I think the regular expression operations which in parse, quotes and parse2 were not safe for utf8 string. So, I decide to emulate what they do. However, I do not understand what getlexicon is doing and I would like to learn what q should looks like before it is passed to evaluate. I do not understand that vocabulary seems to store like integer, is getlexicon a step to look up the string to convert them to integer? I am getting lost. Could some experienced developer help me out of these? Rgs, Kent Sin --------------------------------- kentsin.weblogs.com kentsin.imeme.net def query(self, s, default_operator = Or, ws = (string.whitespace,)): """ This is called by TextIndexes. A 'query term' which is a string 's' is passed in, along with an index object. s is parsed, then the wildcards are parsed, then something is parsed again, then the whole thing is 'evaluated' """ # First replace any occurences of " and not " with " andnot " s = ts_regex.gsub('[%s]+and[%s]*not[%s]+' % (ws * 3), ' andnot ', s) # do some parsing q = parse(s) ## here, we give lexicons a chance to transform the query. ## For example, substitute wildcards, or translate words into ## various languages. q = self.getLexicon(self._lexicon).query_hook(q) # do some more parsing q = parse2(q, default_operator) ## evalute the final 'expression' return self.evaluate(q)