On Tue, 22 Aug 2000, Andy McKay wrote:
I havent been able to the find TFM to read on Vocabulary and stop words in ZCatalog. I need to search by stuff such as XML::Parser and I think I need to patch 2.2 to do it. But a FM would help. Can anyone point me that way.
I don't think there is one. Basically, if you want to search on terms that include punctuation, you have to write your own Splitter.c. Have fun <wry grin>. You don't by the way, have to write it in C, although not doing so presumably has performance implications or it wouldn't have been written in C to begin with. But if all you want the splitter to do is split at blanks and truncate long words, you probably don't need C... You are presumably talking about text indexes if you are worried about Vocabulary and stop words. Most of the guts of this stuff is actually located in a module named SearchIndex. Reading the source code there is as close to a FM as I think you'll get right now. The current Vocablulary does some appropriate wrapping up of modules in SearchIndex for use by Catalog; if you want to do your own thing without touching Zope's default machinery you'll need to write your own Vocabulary object. It shouldn't be too hard if you model it after the existing source. The current text index *does* try to do something sensible in the case you cite, however. Words are indexed after being broken at punctuation. When a word containing embedded punctuation is used as a search term, it is turned into a "near" search (xml near parser, for example). I have not tested whether or not this actually works, but from my reading of the code I *think* what it does is equivalent to an 'and' search on the two words except that the nearer the two words are in the document the earlier in the result set the document appears (assuming you don't sort the result set yourself). Note that there was a longstanding bug in the search term parsing machinery that caused some search terms with embedded punctuation to fail to return any results. I submitted a patch for this that has been incorporated as of 2.2.1b1. (The bug should not have affected a search term like XML::Parser.) In theory, I think that instead of rewriting the splitter module you could rewrite SearchIndex/ResultList's notion of what 'near' means to constrain the words to be right next to each other. You should even be able to enforce ordering. If it works, it might be a easier than rewriting the splitter, since you'd only be changing one python function. I've been digging around in the SearchIndex code for a while now, so if you want to ask me more questions, go ahead. It doesn't mean I'll know the answers, but I'm happy to share whatever I *have* learned. --RDM PS: this question is really more into 'zope-dev' terratory than 'zope' terratory, if you want to move it.