[Zope-CVS] CVS: Products/ZCTextIndex - QueryParser.py:1.7
Guido van Rossum
guido@python.org
Mon, 20 May 2002 12:03:56 -0400
Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv10536
Modified Files:
QueryParser.py
Log Message:
QueryParser.py:
- Rephrased the description of the grammar, pointing out that the
lexicon decides on globbing syntax.
- Refactored term and atom parsing (moving atom parsing into a
separate method). The previously checked-in version accidentally
accepted some invalid forms like ``foo AND -bar''; this is fixed.
tests/testQueryParser.py:
- Each test is now in a separate method; this produces more output
(alas) but makes pinpointing the errors much simpler.
- Added some tests catching ``foo AND -bar'' and similar.
- Added an explicit test class for the handling of stopwords. The
"and/" test no longer has to check self.__class__.
- Some refactoring of the TestQueryParser class; the utility methods
are now in a base class TestQueryParserBase, in a different order;
compareParseTrees() now shows the parse tree it got when raising an
exception. The parser is now self.parser instead of self.p (see
below).
tests/testZCTextIndex.py:
- setUp() no longer needs to assign to self.p; the parser is
consistently called self.parser now.
=== Products/ZCTextIndex/QueryParser.py 1.6 => 1.7 ===
+ A sequence of characters not containing whitespace or parentheses or
- double quotes, and not equal to one of the key words 'AND', 'OR', 'NOT'; or
+ double quotes, and not equal (ignoring case) to one of the key words
+ 'AND', 'OR', 'NOT'; or
-+ A non-empty string enclosed in double quotes. The interior of the string
- can contain whitespace, parentheses and key words.
-
-In addition, an ATOM may optionally be preceded by a hyphen, meaning
-that it must not be present.
-
-An unquoted ATOM may also end in a star. This is a primitive
-"globbing" function, meaning to search for any word with a given
-prefix.
++ A non-empty string enclosed in double quotes. The interior of the
+ string can contain whitespace, parentheses and key words, but not
+ quotes.
+
++ A hyphen followed by one of the two forms above, meaning that it
+ must not be present.
+
+An unquoted ATOM may also contain globbing characters. Globbing
+syntax is defined by the lexicon; for example "foo*" could mean any
+word starting with "foo".
When multiple consecutive ATOMs are found at the leaf level, they are
connected by an implied AND operator, and an unquoted leading hyphen
@@ -202,32 +204,37 @@
tree = self._parseOrExpr()
self._require(_RPAREN)
else:
- atoms = [self._get(_ATOM)]
- while self._peek(_ATOM):
- atoms.append(self._get(_ATOM))
nodes = []
- nots = []
- for a in atoms:
- words = self._lexicon.parseTerms(a)
- if not words:
- self._ignored.append(a)
- continue # Only stopwords
- if len(words) > 1:
- n = ParseTree.PhraseNode(" ".join(words))
- elif self._lexicon.isGlob(words[0]):
- n = ParseTree.GlobNode(words[0])
- else:
- n = ParseTree.AtomNode(words[0])
- if a[0] == "-":
- n = ParseTree.NotNode(n)
- nots.append(n)
- else:
- nodes.append(n)
+ nodes = [self._parseAtom()]
+ while self._peek(_ATOM):
+ nodes.append(self._parseAtom())
+ nodes = filter(None, nodes)
if not nodes:
- return None # Only stowords
- nodes.extend(nots)
+ return None # Only stopwords
+ structure = [(isinstance(nodes[i], ParseTree.NotNode), i, nodes[i])
+ for i in range(len(nodes))]
+ structure.sort()
+ nodes = [node for (bit, index, node) in structure]
+ if isinstance(nodes[0], ParseTree.NotNode):
+ raise ParseTree.ParseError(
+ "a term must have at least one positive word")
if len(nodes) == 1:
- tree = nodes[0]
- else:
- tree = ParseTree.AndNode(nodes)
+ return nodes[0]
+ tree = ParseTree.AndNode(nodes)
+ return tree
+
+ def _parseAtom(self):
+ term = self._get(_ATOM)
+ words = self._lexicon.parseTerms(term)
+ if not words:
+ self._ignored.append(term)
+ return None
+ if len(words) > 1:
+ tree = ParseTree.PhraseNode(words)
+ elif self._lexicon.isGlob(words[0]):
+ tree = ParseTree.GlobNode(words[0])
+ else:
+ tree = ParseTree.AtomNode(words[0])
+ if term[0] == "-":
+ tree = ParseTree.NotNode(tree)
return tree