[Zope] Catalog search problem
sean.upton@uniontrib.com
sean.upton@uniontrib.com
Thu, 06 Sep 2001 16:07:30 -0700
Paul,
One suggestion that I might make is to consider rewriting your queries
dynamically to be more to your liking before querying them in ZCatalog. As
an example, the following python class method (below) is a utility method
that I use to rewrite queries using the re regex library. This is called
via <dtml-call> to rewrite the query in REQUEST right before <dtml-in
Catalog> is called... You could very likely do something similar for
writing in "default" boolean operators or quotes like Dieter suggests,
supposing you wanted to make such behavior an optional default. Of course,
for some setups this wouldn't be very good default behavior, so that
decision is left up to you as an application design choice...
Sean
def queryExtender(self, query):
"""
Takes, as input, query for Text index of ZCatalog, and
makes it more intelligent by parsing it and rewriting it
to include wildcards at the end of words so that we can
search sub-words; in other words, a search for something
like "engineer" should yield results for "engineer*" so
that terms like "engineers" and "engineering" also are
considered matches.
Obviously, we have to be careful not to incorrectly
parse the query, and we don't want to mess with words
that already have wildcards at the end, because you
don't want to end up with something like "engineer**"
"""
### Define Character Patterns to Strip Out and Split Upon
everythingButSearchTerms = '[^A-Za-z0-9*]+' #Regex Pattern
### Create the word list
result = re.split(everythingButSearchTerms, query)
### Get rid of empty string elements in the word list
try:
for i in range(result.count('')):
result.remove('')
except:
pass
### Get rid of boolean operators
booleanops =
'^([Aa][Nn][Dd])|([Oo][Rr])|([Aa][Nn][Dd][Nn][Oo][Tt])|([Nn][Ee][Aa][Rr])$'
i=0 #count variable, used for indexing
for item in result:
if re.search(booleanops, item):
result.pop(i)
i = i + 1
### Now, result is a list of just the words that are
### meaningful to the search, but we need to eliminate
### any entries that have wildcards in them, because
### they are likely more specific than our rewrite here
asteriskinterm = '(^[*])|([*]$)$'
#asterisk at start or end of term
i=0 #count variable, used for indexing
for item in result:
if re.search(asteriskinterm, item):
result.pop(i)
i = i + 1
### Now, the list of words in the query we need to modify is
### final, so we can start modifying the queries, one word
### at a time...
for item in result:
#query = re.sub(item, '*'+item+'*', query, count=1)
if (len(item) > 3):
query = re.sub(item, item+'*', query, count=1)
else:
if (len(item) != 1):
query = re.sub(item, item+'?', query,
count=1)
return query
-----Original Message-----
From: Dieter Maurer [mailto:dieter@handshake.de]
Sent: Thursday, September 06, 2001 3:04 PM
To: paul dunbar
Cc: zope@zope.org
Subject: Re: [Zope] Catalog search problem
paul dunbar writes:
> I have a problem when searching my catalog.It has an index called
"Author" which holds the
> name of a person who wrote a document.when i search the catalog for say
"paul dunbar",i will
> get documents from authors like "paul" or "paul test" as well as paul
dunbar.what i want to do
> is limit the matches to "paul dunbar"....
Missing operators between search times are "replaced" by
the default operator ("or"; in Zope 2.4, you can define
"and" as default operator).
To get almost what you want, enclose "paul dunbar" in quotes.
This will make a phrase search, quite near to a search
for "paul dunbar"...
Dieter
_______________________________________________
Zope maillist - Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists -
http://lists.zope.org/mailman/listinfo/zope-announce
http://lists.zope.org/mailman/listinfo/zope-dev )