ZCatalog and 'fuzzy logic'
Is there anyone who could try to give an estimate of how long it would take to add fuzzy logic (regexp-like) searching capability to the ZCatalog? And reasoning as to why would be appreciated. ;) -Morten
Morten W. Petersen wrote:
Is there anyone who could try to give an estimate of how long it would take to add fuzzy logic (regexp-like) searching capability to the ZCatalog?
And reasoning as to why would be appreciated. ;)
Right now, you could use an External Method to apply a regex match to each unique value in a field index in a Catalog, and return the appropriate Catalog Brains for each match. This is as easy as called uniqueValues() on the catalog, iterating through the unique values to filter them, and then searching the catalog with the results of the filter as the constraint for that fieldindex. This would minutes and hours to implement and test, and would execute in O(number of unique field values) time, for many values of the fieldindex, which should remain acceptably fast where you have a catalog with many items, most of which have fields drawn from the same (small) set. If you want to search a TextIndex using a regex, or you want to search for a pattern among a number of fields of the same item, then you're into an algorithm that would execute in O(number of cataloged items) time. That could get very slow for any sizable catalog. The other option for searching a TextIndex is to use extensions to the NEAR and AND and OR operators that are currently supported. I guess it all depends what you mean by "fuzzy matching". -- Steve Alexander Software Engineer Cat-Box limited http://www.cat-box.net
On Tue, 9 Jan 2001, Steve Alexander wrote:
The other option for searching a TextIndex is to use extensions to the NEAR and AND and OR operators that are currently supported. I guess it all depends what you mean by "fuzzy matching".
Well, to try to explain the problem: If I have 1.000.000 objects catalogued, all of the meta type Person, and I want to find all the instances matching the regexp-pattern '?????Peter' on those objects' attribute firstname, and the regexp-pattern '???ne' on that objects' attribute lastname, how could I effectively implement that? I.e., is there a need to get down on the C level to get a reasonably fast search? Am I making any sense? Cheers, Morten
participants (2)
-
Morten W. Petersen -
Steve Alexander