[ZODB-Dev] B-tree enhancement : interval search.
Christian Reis
kiko at async.com.br
Thu Sep 4 18:55:38 EDT 2003
On Thu, Sep 04, 2003 at 03:48:58PM +0400, Vladimir Voznesensky wrote:
> Imagine, that we have a B-tree of N-letters words sorted in alphabetical
> order.
> Let's we need to find all the strings satisfying a specific template
> consisting of letters and "?" wildcards.
> For example, a?b or ?c?.
> The traditional approach for a?b is to find the first a?? word and
> iterate sequentially while first letter is a.
> The traditional approach for ?c? is to iterate over all of words (or
> call [a-z]c searching for every letter in the alphabet).
> Is it optimal? No.
> In many cases we can eliminate some pages and some parts of pages, for
> example, [abc,aca] interval needs not to be checked for a?b. We can use
> real distribution of words in specific B-tree to accelerate searching.
This is an awesome feature; we could *really* use this in IndexedCatalog
to speed up string queries, which is one of the top requests we get from
our users.
> I'm a new fellow in Zope. I'm ready to realize such an extention by
> myself, but I do not understand, how to organize my interaction with
> Zope folks: how to make feature requests, how to propagate my code, etc.
> As for Berkeley DB, SleepyCat Software disagreed to add my feature to
> their source tree, because their customers are not interested in such
> extention.
The main problem you'll have with the ZODB team is that everybody's
always so damned busy <wink>. I found the best way to work with them is
to persist -- if you don't get an answer, wait a bit, and repost. While
coding, follow the standards of the code around you -- you'll see there
are about 4000 lines of C code (and about 2000 of python) in the BTrees
package that should provide a good guide for whatever new implementation
you start.
You might want to post a short proposal indicating some basic choices
-- class names, file names, module organization -- to get a feel for
what sort of conventions people prefer, and the level of acceptance your
feature garners.
Having said all that, if you *do* have a hard time getting code
integrated, I don't imagine it would be hard to write an add-on for the
BTrees package in the ZODB distribution and ship it separately.
> My algorythm accelerates searching in many cases from times to
> magnitudes (depending on template and data variations) comparing to
> traditional approach of standard Berkeley DB API. I'd like to have and
> use the same thing in ZODB.
Question: does the template necessarily have a fixed number of
characters, or is something like the * wildcard supported too?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
More information about the ZODB-Dev
mailing list