[ZODB-Dev] Fast * wildcard search and word-part indexing + B-tree
interval search.
Vladimir Voznesensky
vovic at smtp.ru
Mon Sep 15 12:03:44 EDT 2003
Hello.
As you (kiko) has mentioned above, there is a need for fast wildcard search.
Shure, every specific task has it's own optimal searching strategy.
But some tricks are possible that can decrease the number of records to
be traversed.
My suggestion is the following:
Let's put every word and tail of word into our b-tree.
For example, word "mother" generates the following keys:
mother
other
ther
her
er
r
(There would be <number of words>*<number of letters in word> keys in
our b-tree.)
Every ? and * wildcard template consists of ? wildcard templates
interleaved with *s.
For example, *ab?cd*e?f?g consists of ?-templates:
ab?cd
e?f?g
At the first step, we choose ?-template for interval estimation search.
For example, ab?cd.
Better to choose ?-template that have minimum number of matches in our
b-tree.
(There can be different techniques to make this estimations.)
At the second step, we iterate over records that match that ?-template
(using interval estimation functions) and get the appropriates.
Estimation function for [A,B) interval and ab?cd template is the following:
1. If first 5 letters of B<"abacd" or of A>"abzcd", all the keys in the
interval does not match ab?cd.
2. If first 5 letters of A=of B="ab<some letter>cd", all the keys of the
interval do match.
3. Otherwise, possibly some of the keys do match, some-do not, and we
must consider sub-division of the interval.
Farewell.
VV
More information about the ZODB-Dev
mailing list