Paul, One suggestion that I might make is to consider rewriting your queries dynamically to be more to your liking before querying them in ZCatalog. As an example, the following python class method (below) is a utility method that I use to rewrite queries using the re regex library. This is called via <dtml-call> to rewrite the query in REQUEST right before <dtml-in Catalog> is called... You could very likely do something similar for writing in "default" boolean operators or quotes like Dieter suggests, supposing you wanted to make such behavior an optional default. Of course, for some setups this wouldn't be very good default behavior, so that decision is left up to you as an application design choice... Sean def queryExtender(self, query): """ Takes, as input, query for Text index of ZCatalog, and makes it more intelligent by parsing it and rewriting it to include wildcards at the end of words so that we can search sub-words; in other words, a search for something like "engineer" should yield results for "engineer*" so that terms like "engineers" and "engineering" also are considered matches. Obviously, we have to be careful not to incorrectly parse the query, and we don't want to mess with words that already have wildcards at the end, because you don't want to end up with something like "engineer**" """ ### Define Character Patterns to Strip Out and Split Upon everythingButSearchTerms = '[^A-Za-z0-9*]+' #Regex Pattern ### Create the word list result = re.split(everythingButSearchTerms, query) ### Get rid of empty string elements in the word list try: for i in range(result.count('')): result.remove('') except: pass ### Get rid of boolean operators booleanops = '^([Aa][Nn][Dd])|([Oo][Rr])|([Aa][Nn][Dd][Nn][Oo][Tt])|([Nn][Ee][Aa][Rr])$' i=0 #count variable, used for indexing for item in result: if re.search(booleanops, item): result.pop(i) i = i + 1 ### Now, result is a list of just the words that are ### meaningful to the search, but we need to eliminate ### any entries that have wildcards in them, because ### they are likely more specific than our rewrite here asteriskinterm = '(^[*])|([*]$)$' #asterisk at start or end of term i=0 #count variable, used for indexing for item in result: if re.search(asteriskinterm, item): result.pop(i) i = i + 1 ### Now, the list of words in the query we need to modify is ### final, so we can start modifying the queries, one word ### at a time... for item in result: #query = re.sub(item, '*'+item+'*', query, count=1) if (len(item) > 3): query = re.sub(item, item+'*', query, count=1) else: if (len(item) != 1): query = re.sub(item, item+'?', query, count=1) return query -----Original Message----- From: Dieter Maurer [mailto:dieter@handshake.de] Sent: Thursday, September 06, 2001 3:04 PM To: paul dunbar Cc: zope@zope.org Subject: Re: [Zope] Catalog search problem paul dunbar writes:
I have a problem when searching my catalog.It has an index called "Author" which holds the name of a person who wrote a document.when i search the catalog for say "paul dunbar",i will get documents from authors like "paul" or "paul test" as well as paul dunbar.what i want to do is limit the matches to "paul dunbar".... Missing operators between search times are "replaced" by the default operator ("or"; in Zope 2.4, you can define "and" as default operator).
To get almost what you want, enclose "paul dunbar" in quotes. This will make a phrase search, quite near to a search for "paul dunbar"... Dieter _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )