On Thu, 17 Aug 2000, Martijn Pieters wrote:
No clues as to where you'll find the stopword code, but the Persistence thingy is caused by the magic that ZODB performs: it initializes the correct Persistence module when it itself is imported. This way Jim managed to have ZODB3 and BoboPOS2 exist in the same Zope distribution.
Do an import ZODB before you do your Splitter import, and all will be dandy.
Thanks, worked like a charm. I think I've found the stopword code. To cement my understanding I'm going to write this up. Maybe somebody will find it useful <grin>. UnTextIndex accesses the splitter through the Splitter method of the Lexicon associated with the index. That Lexicon instance is created when the Vocabulary or Catalog are created. (Comments in the code indicate that in the future each TextIndex could have its own Lexicon, which makes sense to me.) A Lexicon instance can be passed a list of stop words (and/or synonyms) when it is initialized. Vocabulary does this for Lexicon (but not GlobbingLexicon, which internal comments indicates does not use stopwords). The Lexicon instance stores this list in a property, and passes it to the real Splitter when its Splitter method is called. So the fix that I submitted earlier today to the collector for the 'and' involving stopwords should work for 'listed' stopwords as well as the punctuation and numbers that I was able to test it on. (In my comments in the patch I said I wasn't sure). I still can't test it because I'm using a Globbing lexicon <wry grin>. In perusing the code I'm also feeling more confident that the change I made to __getitem__ in that fix is in fact semantically correct. Or at least consistent with the rest of the __getitem__ code. GlobbingLexicon not using stopwords also explains the few hits on 'the and car' that I got that I was confused by. Those entries really must have 'the' as an indexed term, unlike the rest. Oh, by the way, the comments in TextIndex seem to agree with me as to the conventional meaning of the word 'stemmed' <grin>. --RDM