Some people thing, 'why not use re (the Python regex module)?', because searching like '*ing' would require iterating over all the keys, a linear search like this could take multiple order of maginitude more time than a non-regex search.
I understand this having worked with various relational and not quite so relational databases in the past - however, even on a 355,000 record file UniVerse can perform a search using its strange regular expression like system in ~20 seconds (this being without the benefit of indexes, when searching on an indexed field the same search is done in about 2 seconds). I suspect that even if it has to iterate through every key, there won't be any performance problems. Also, this brings up another point... Can the basic search interface on the Folder object itself do regex searches on the content? Can it be hacked to do so? Speed isn't a concern at the moment since the dataset will be quite small, and the machine its on has plenty of power to spare.
There is a pretty good compromise solution called n-grams, but they also result in a lexicon increase, and a much more complicated algorithm. I can refer you to a good book that describes them.
For how little they're paying me, I think I'll pass on this one ;-)
Yes. The 'lexicon' has a hardwired 'synonym and stopword' dictionary in lib/python/SearchIndex/Lexicon.py. This is also projected to be improved by allowing through-the-web lexicon managment (like specifying stopwords and synonmys). Someone also suggested interfacing it to some kind of synonym database, you'd have to search through the arvhives to find the reference.
Cool! I was thinking of doing this myself, but if I just have to edit a single file it shouldn't be all that bad!
Am I asking too much of this? Should I be buying a Python book and adding this functionality myself? Should I be using something other than ZCatalog? Should I be using something other than Zope? (Please say no, I happen to like Zope!)
Go for it, but don't give up on ZCatalog or Zope, I'd be surprised if you found fully featured regex searching in another package that would take less of a headache to use than just implimenting a simple 'reversed' lexicon that let's you do globbing (like dos wildcard, no *s in the middle of words, etc.).
I'm thinking of using a different search product since I don't know python at all, and I'm not sure if I want to start out by adding features to such a large and well thought out product. Are there any other search products for Zope available? As disgusting as the thought is, perhaps I should be looking into using a relational database for this (just for the improved searching - I'd rather use a pure Zope solution)? Does anyone have any suggestions for storing hierarchical data in a relational database? And while on the subject of databases... Does Zope/Python have any way of interfacing with a MultiValue database such as UniVerse (prefered since we already use UV here), jBase, UniData, D3, etc? Thanks! -- Dave Kimmel Systems Analyst Office of the Public Trustee, Alberta Justice