[Zope-dev] ZCatalog and 'fuzzy logic'
Steve Alexander
s.alexander@lancaster.ac.uk
Tue, 09 Jan 2001 16:32:47 +0000
Morten W. Petersen wrote:
> Is there anyone who could try to give an estimate of how long it would
> take to add fuzzy logic (regexp-like) searching capability to the
> ZCatalog?
>
> And reasoning as to why would be appreciated. ;)
Right now, you could use an External Method to apply a regex match to
each unique value in a field index in a Catalog, and return the
appropriate Catalog Brains for each match.
This is as easy as called uniqueValues() on the catalog, iterating
through the unique values to filter them, and then searching the catalog
with the results of the filter as the constraint for that fieldindex.
This would minutes and hours to implement and test, and would execute in
O(number of unique field values) time, for many values of the
fieldindex, which should remain acceptably fast where you have a catalog
with many items, most of which have fields drawn from the same (small) set.
If you want to search a TextIndex using a regex, or you want to search
for a pattern among a number of fields of the same item, then you're
into an algorithm that would execute in O(number of cataloged items)
time. That could get very slow for any sizable catalog.
The other option for searching a TextIndex is to use extensions to the
NEAR and AND and OR operators that are currently supported. I guess it
all depends what you mean by "fuzzy matching".
--
Steve Alexander
Software Engineer
Cat-Box limited
http://www.cat-box.net