[ZODB-Dev] Fast * wildcard search and word-part indexing + B-tree
interval search.
Casey Duncan
casey at zope.com
Wed Sep 17 13:36:53 EDT 2003
A possible roadblock to optimizing this is that there is currently no BTree
variant expressly for string keys. There's a pair of BTree types for integer
keys and a pair for arbitrary (comparible) Python object keys. The comparison
operations you outlined are doable with object keys, but what optimizations
you can get may be limited without a specific string-key btree variant. Were
I to suggest the latter, I would expect a well-deserved beating from Tim and
anyone else currently in line for maintenance of the btree code ;^)
It might be worthwhile to try an implementation that uses the existing btree
range selection capabilities and see how it performs. I strongly suspect that
performance for large corpuses will be limited mostly by disk IO speed. So an
avenue to think about might be an algorithm which loads the fewest values
into memory in the common case.
-Casey
On Tuesday 16 September 2003 05:49 am, Vladimir Voznesensky wrote:
> One addition to estimation function.
>
> > Estimation function for [A,B) interval and ab?cd template is the
> > following:
> > 1. If first 5 letters of B<"abacd" or of A>"abzcd", all the keys in
> > the interval does not match ab?cd.
> > 2. If first 5 letters of A=of B="ab<some letter>cd", all the keys of
> > the interval do match.
> > 3. Otherwise, possibly some of the keys do match, some-do not, and we
> > must consider sub-division of the interval.
>
> Shure, we can exclude intervals of the form:
> "ab<X>cd"<A[0:5] and B[0:5]<"ab<X+1>cd" where <X> is some character and
> <X+1> -- it's successor.
>
> So, estimation function for [A,B) interval and ab?cd template becomes
> the following:
> 1. If first 5 letters of B<"abacd" or of A>"abzcd", all the keys in the
> interval does not match ab?cd.
> 2. If first 5 letters of A=of B="ab<some letter>cd", all the keys of the
> interval do match.
> 3. If first 5 letters of A>"ab<X>cd" and of B<"ab<X+1>cd", all the keys
> in the inteval do not match.
> 4. Otherwise, possibly some of the keys do match, some-do not, and we
> must consider sub-division of the interval.
>
> Dear Christian, who is the maintainer of btree code?
> I'm interested in writing btree interval searching subroutine and
> estimation functions hook in C by myself.
> I'm not interested in writing estimation functions specifically for
> character strings (I follow another aims), but it seems that many folks are.
> I can advise and help to write string estimation functions (I've wrote
> them for Berkeley DB) and appropriate indices for Zope IndexedCatalog.
> I have no large experience in Zope and (Python+C) development.
> I'm trying to find co-llaboratives and possible consumers of my work.
>
> Farewell.
>
> VV
>
>
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list - ZODB-Dev at zope.org
> http://mail.zope.org/mailman/listinfo/zodb-dev
>
More information about the ZODB-Dev
mailing list