On Thu, 20 Nov 2003 12:38:24 -0500 "Small Business Services" <toolkit@magma.ca> wrote:
Why are wildcards '?' and '*' not supported at the beginning of search terms in ZCTextIndex? It would be very useful to search for terms using '*someterm'.
In the cvs for ZCTextIndex, Lexicon.py (http://cvs.zope.org/Products/ZCTextIndex/Lexicon.py?annotate=1.17.10.2)
the code raises an exception for wildcards at the beginning of search terms (see line 113) and a related comment says"
111 # The pattern starts with a globbing character. 112 # This is too efficient, so we raise an exception.
Why is this 'too efficient"?
I think it should sat "too inefficient". The data structures in the lexicon as it is currently implemented cannot efficiently return all of the matching words for *foo. It would require iterating all of the words in the lexicon. As Andreas said, it would be possible to implement this efficiently if the lexicon kept a separate head globbing index, but this would greatly increase the size of the lexicon and would make updates somewhat more expensive (although probably not too much in steady-state). I'm curious, you said you had 700,000 some-odd documents in your catalog. How many words are in the lexicon(s) you have? -Casey