RE: [Zope] ZCatalog searching questions

30 Sep 1999

      ...
Some people thing, 'why not use re (the Python regex 
module)?', because
searching like '*ing' would require iterating over all the keys, a
linear search like this could take multiple order of maginitude more
time than a non-regex search.
I understand this having worked with various relational and not quite so
relational databases in the past - however, even on a 355,000 record file
UniVerse can perform a search using its strange regular expression like
system in ~20 seconds (this being without the benefit of indexes, when
searching on an indexed field the same search is done in about 2 seconds).
I suspect that even if it has to iterate through every key, there won't be
any performance problems.

Also, this brings up another point...  Can the basic search interface on the
Folder object itself do regex searches on the content?  Can it be hacked to
do so?  Speed isn't a concern at the moment since the dataset will be quite
small, and the machine its on has plenty of power to spare.
...
There is a pretty good compromise solution called n-grams, 
but they also
result in a lexicon increase, and a much more complicated 
algorithm.  I
can refer you to a good book that describes them.
For how little they're paying me, I think I'll pass on this one  ;-)
...
Yes.  The 'lexicon' has a hardwired 'synonym and stopword' 
dictionary in
lib/python/SearchIndex/Lexicon.py.  This is also projected to be
improved by allowing through-the-web lexicon managment (like 
specifying
stopwords and synonmys).  Someone also suggested interfacing 
it to some
kind of synonym database, you'd have to search through the arvhives to
find the reference.
Cool!  I was thinking of doing this myself, but if I just have to edit a
single file it shouldn't be all that bad!
...
...
Am I asking too much of this?  Should I be buying a Python 
book and adding
this functionality myself?  Should I be using something 
other than ZCatalog?
Should I be using something other than Zope?  (Please say 
no, I happen to
like Zope!)
Go for it, but don't give up on ZCatalog or Zope, I'd be surprised if
you found fully featured regex searching in another package that would
take less of a headache to use than just implimenting a simple
'reversed' lexicon that let's you do globbing (like dos 
wildcard, no *s
in the middle of words, etc.).
I'm thinking of using a different search product since I don't know python
at all, and I'm not sure if I want to start out by adding features to such a
large and well thought out product.  Are there any other search products for
Zope available?

As disgusting as the thought is, perhaps I should be looking into using a
relational database for this (just for the improved searching - I'd rather
use a pure Zope solution)?  Does anyone have any suggestions for storing
hierarchical data in a relational database?

And while on the subject of databases...  Does Zope/Python have any way of
interfacing with a MultiValue database such as UniVerse (prefered since we
already use UV here), jBase, UniData, D3, etc?

Thanks!
-- Dave Kimmel
Systems Analyst
Office of the Public Trustee, Alberta Justice