For what it's worth, here is the code I am using to parse the query and add the wildcards; it seems to be non-intrusive towards all ops, including (), and it's been moderately tested. It might be slightly inefficient, though... Feel free to borrow and/or improve on this if it is useful. Sean def queryExtender(self, query): """ Takes, as input, query for Text index of ZCatalog, and makes it more intelligent by parsing it and rewriting it to include wildcards at the end of words so that we can search sub-words; in other words, a search for something like "engineer" should yield results for "*engineer*" so that terms like "engineers" and "engineering" also are considered matches. Obviously, we have to be careful not to incorrectly parse the query, and we don't want to mess with words that already have wildcards at the end, because you don't want to end up with something like "*engineer**" """ ### Define Character Patterns to Strip Out and Split Upon everythingButSearchTerms = '[^A-Za-z0-9*]+' #Regex Pattern ### Create the word list result = re.split(everythingButSearchTerms, query) ### Get rid of empty string elements in the word list try: for i in range(result.count('')): result.remove('') except: pass ### Get rid of boolean operators booleanops = '^([Aa][Nn][Dd])|([Oo][Rr])|([Aa][Nn][Dd][Nn][Oo][Tt])|([Nn][Ee][Aa][Rr])$' i=0 #count variable, used for indexing for item in result: if re.search(booleanops, item): result.pop(i) i = i + 1 ### Now, result is a list of just the words that are ### meaningful to the search, but we need to eliminate ### any entries that have wildcards in them, because ### they are likely more specific than our rewrite here asteriskinterm = '(^[*])|([*]$)$' #asterisk at start or end of term i=0 #count variable, used for indexing for item in result: if re.search(asteriskinterm, item): result.pop(i) i = i + 1 ### Now, the list of words in the query we need to modify is ### final, so we can start modifying the queries, one word ### at a time... for item in result: query = re.sub(item, '*'+item+'*', query, count=1) return query -----Original Message----- From: Casey Duncan [mailto:cduncan@kaivo.com] Sent: Monday, July 30, 2001 10:36 AM To: sean.upton@uniontrib.com; zope-dev@zope.org Subject: Re: Catalog Query Feature Request,was: RE: [Zope-dev] An idea for Un iqueValuesFor sean.upton@uniontrib.com wrote:
I could definitely see the value of a unique-values query into ZCatalog, especially for creating things using <dtml-tree> using keywords, etc...
I'm wondering the best way to implement this on the API side, since it would change the output from catalog results to just attribute values. Any thoughts?
On a slightly related (well, not really) note, CatalogQuery looks like it would solve a lot of problems I have had with a very Catalog-intensive application. One thought I had - I might suggest the possibility of
adding
a fuzzy matching operator to CatalogQuery that performs the function of wrapping wildcard searches on search terms for Text Indexes, supposing the Catalog is using a globbing vocabulary:
~= as an operator would mean an approximate (substring) match
So a search for 'title ~= "engineer"' would perform a search for '*engineer*' and return results containing words like engineer, engineers, engineering, etc.
That sounds like a good idea. Would a simple split/join work, something like: ops = ('and', 'or') words = query_string.lower().split() for word in words: if word not in ops: word = '*%s*' % word query_string = words.join() I can look at adding this capability
Right now, I attempt to safely rewrite
REQUEST['someFieldThatIamSearching']
with a Python class method that uses a zillion re.sub() calls to wrap search terms in * characters; I wonder if there is a way to alternately implement something like this at a lower level, perhaps in CatalogQuery; I get the feeling it would be quicker and much more simple.
If something like that were implemented as well as some equivalent to sort_on, I'd stop pulling my hair out with traditional workarounds and definitely switch all my stuff to use CatalogQuery instead...
Yeah, I definitely want to add a sort_on capability. I think I will implement it as an optional argument, like it is for ZCatalog, rather than as part of the query string, at least for now.
Thoughts?
Sean
-----Original Message----- From: Casey Duncan [mailto:cduncan@kaivo.com] Sent: Monday, July 30, 2001 8:20 AM To: Chris Withers Cc: zope-dev@zope.org; Anthony Baxter Subject: Re: [Zope-dev] An idea for UniqueValuesFor
Chris Withers wrote:
Casey Duncan wrote:
possibly, yes. I'll look to add this to my CatalogQuery product. I believe the btrees can be pressed into service here...
Hadn't heard of this CatalogQuery product... where can I find out more?
I think I may have been about to develop something similar, so maybe we
can help
each otehr out?
cheers,
Chris
http://www.zope.org/Members/Kaivo/CatalogQuery
This is my first stab at this. I forsee a much more general query mechanism in the future, but this works better than the stock stuff (for me) and it works today!
Let me know what your ideas are...
-- | Casey Duncan | Kaivo, Inc. | cduncan@kaivo.com `------------------>
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
-- | Casey Duncan | Kaivo, Inc. | cduncan@kaivo.com `------------------> _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )