ZCatalog text index search bugs?
I am very confused. I'm looking at the SearchIndex source under 2.1.4 (2.1.6 seems to be the same). In Lexicon.py the 'query' method defines the default_operator to be 'or'. I can't see that TextIndex overrides this when it calls it. But the response to PR 1141 (against 2.1.6) in the collector says: The TextIndex search does an AND, not an OR, of the search words: if you ask it to find "foo bar", it returns only objects matching *both* "foo" and "bar", rather than object matching *either* "foo" or "bar" (which Jason expected). Indeed, if you do a search that includes a word that is not on an item, the item is not returned. So how is that working? A possible answer is: if you do a search like 'something or somethingelse', this *also* does not return the object if one of those words is not on the object. So is 'or' searching broken? Note that if you do a search like "something or with", this returns the object, "with" being a stop word. So does "something with". On the other hand, "something and with" does *not* return the object. So I think 'or' searching is broken, and that text indexes being a default 'and' search is just an accident <grin>. Following up on the 'something and with', though: Since "with" is a stop word, it can never be on the object. Since the user entering search words into the search form doesn't know what the list of stop words is, this stikes me as broken behavior. Anyone disagree? I also have a problem with a word such as "T-shirt". If I search on "T-shirt", my object that has that word in its text index does not show up. The splitter should be breaking that into "t" and "shirt", right? Is the problem that single letters are discarded by the Splitter, therefore T is like a stop word (but it isn't in the stopword table), therefore the implicit 'and' search(*) fails? To corroborate this, a search for "something t" finds that record, but "something and t" does not. This can't be the whole answer, though, since searching on just 'shirt' does *not* return the object. (*) I recall reading that the 'near' operator, which is used if the splitter breaks up a word in the search string, is not really supported and that the 'and' operator is used instead.) I can't tell yet if this bug is (these bugs are?) fixed in 2.2.0b1 since I can't see the source release yet. Looking at the a1 source, things have moved around a bit. But I see that "default_operator" is still set to 'or', so I suspect these bugs may remain... If I can reproduce this in 2.2.0b I'll file it in the collector. --RDM
participants (1)
-
R. David Murray