[Zodb-checkins] CVS: Zope3/lib/python/Persistence/BTrees - Maintainer.txt:1.1.2.1

Tim Peters tim.one@comcast.net
Tue, 4 Jun 2002 15:48:32 -0400


Update of /cvs-repository/Zope3/lib/python/Persistence/BTrees
In directory cvs.zope.org:/tmp/cvs-serv21247

Added Files:
      Tag: Zope-3x-branch
	Maintainer.txt 
Log Message:
Copying in a useful doc file from the trunk.  Some of the info here
doesn't apply to the code in the branch, but it eventually will as
trunk code improvements get duplicated in the branch too.


=== Added File Zope3/lib/python/Persistence/BTrees/Maintainer.txt ===
This document provides information for developers who maintain or
extend BTrees.

Macros
======
BTrees are defined using a "template", roughly akin to a a C++
template.  To create a new family of BTrees, create a source file that
defines macros used to handle differences in key and value types:


Configuration Macros

MASTER_ID
A string to hold an RCS/CVS Id key to be included in compiled binaries.

MOD_NAME_PREFIX
A string (like "IO" or "OO") that provides the prefix used for the
module.  This gets used to generate type names and the internal module
name string.

DEFAULT_MAX_BUCKET_SIZE
An int giving the maximum bucket size (number of key/value pairs).
When a bucket gets larger than this due to an insertion *into a BTREE*,
it splits.  Inserting into a bucket directly doesn't split, and
functions that produce a bucket output (e.g., union()) also have no
bound on how large a bucket may get.  Someday this will be tunable
on BTree instances.

DEFAULT_MAX_BTREE_SIZE
An int giving the maximum size (number of children) of an internal
btree node.  Someday this will be tunable on BTree instances.

Macros for Keys

KEY_TYPE
The C type declaration for keys (e.g., int or PyObject*).

KEY_CHECK(K)
Tests whether the PyObject* K can be converted to the (C) key type
(KEY_TYPE).  The macro should return a boolean (zero for false,
non-zero for true).  When it returns false, its caller should probably
set a TypeError exception.

TEST_KEY_SET_OR(V, K, T)
Like Python's cmp().  Compares K(ey) to T(arget), where K & T are C
values of type KEY_TYPE.  V is assigned an int value depending on
the outcome:
   < 0 if K < T
  == 0 if K == T
   > 0 if K > T
This macro acts like an 'if', where the following statement is
executed only if a Python exception has been raised because the
values could not be compared.

DECREF_KEY(K)
K is a value of KEY_TYPE.  If KEY_TYPE is a flavor of PyObject*, write
this to do Py_DECREF(K).  Else (e.g., KEY_TYPE is int) make it a nop.

INCREF_KEY(K)
K is a value of KEY_TYPE.  If KEY_TYPE is a flavor of PyObject*, write
this to do Py_INCREF(K).  Else (e.g., KEY_TYPE is int) make it a nop.

COPY_KEY(K, E)
Like K=E.  Copy a key from E to K, both of KEY_TYPE.  Note that this
doesn't decref K or incref E when KEY_TYPE is a PyObject*; the caller
is responsible for keeping refcounts straight.

COPY_KEY_TO_OBJECT(O, K)
Roughly like O=K.  O is a PyObject*, and the macro must build a Python
object form of K, assign it to O, and ensure that O owns the reference
to its new value.  It may do this by creating a new Python object based
on K (e.g., PyInt_FromLong(K) when KEY_TYPE is int), or simply by doing
Py_INCREF(K) if KEY_TYPE is a PyObject*.

COPY_KEY_FROM_ARG(TARGET, ARG, STATUS)
Copy an argument to the target without creating a new reference to ARG.
ARG is a PyObject*, and TARGET is of type KEY_TYPE.  If this can't be
done (for example, KEY_CHECK(ARG) returns false), set a Python error
and set status to 0.  If there is no error, leave status alone.


Macros for Values

VALUE_TYPE
The C type declaration for values (e.g., int or PyObject*).

TEST_VALUE(X, Y)
Like Python's cmp().  Compares X to Y, where X & Y are C values of
type VALUE_TYPE.  The macro returns an int, with value
   < 0 if X < Y
  == 0 if X == Y
   > 0 if X > Y
XXX There is no provision for determining whether the comparison
attempt failed (set a Python exception).

DECREF_VALUE(K)
Like DECREF_KEY, except applied to values of VALUE_TYPE.

INCREF_VALUE(K)
Like INCREF_KEY, except applied to values of VALUE_TYPE.

COPY_VALUE(K, E)
Like COPY_KEY, except applied to values of VALUE_TYPE.

COPY_VALUE_TO_OBJECT(O, K)
Like COPY_KEY_TO_OBJECT, except applied to values of VALUE_TYPE.

COPY_VALUE_FROM_ARG(TARGET, ARG, STATUS)
Like COPY_KEY_FROM_ARG, except applied to values of VALUE_TYPE.

NORMALIZE_VALUE(V, MIN)
Normalize the value, V, using the parameter MIN.  This is almost
certainly a YAGNI.  It is a no op for most types. For integers, V is
replaced by V/MIN only if MIN > 0.


Macros for Set Operations

MERGE_DEFAULT
A value of VALUE_TYPE specifying the value to associate with set
elements when sets are merged with mappings via weighed union or
weighted intersection.

MERGE(O1, w1, O2, w2)
Performs a weighted merge of two values, O1 and O2, using weights w1
and w2.  The result must be of VALUE_TYPE.  Note that weighted unions
and weighted intersections are not enabled if this macro is left
undefined.

MERGE_WEIGHT(O, w)
Computes a weighted value for O.  The result must be of VALUE_TYPE.
This is used for "filling out" weighted unions, i.e. to compute a
weighted value for keys that appear in only one of the input
mappings.  If left undefined, MERGE_WEIGHT defaults to

    #define MERGE_WEIGHT(O, w) (O)

MULTI_INT_UNION
The value doesn't matter.  If defined, SetOpTemplate.c compiles
code for a multiunion() function (compute a union of many input sets
at high speed).  This currently makes sense only for II sets, so
only _IIBTree.c defines it.


BTree Clues
===========
More or less random bits of helpful info.

+ In papers and textbooks, this flavor of BTree is usually called
  a B+-Tree, where "+" is a superscript.

+ All keys and all values live in the bucket leaf nodes.  Keys in
  interior (BTree) nodes merely serve to guide a search efficiently
  toward the correct leaf.

+ When a key is deleted, it's physically removed from the bucket
  it's in, but this doesn't propagate back up the tree:  since keys
  in interior nodes only serve to guide searches, it's OK-- and
  saves time --to leave "stale" keys in interior nodes.

+ No attempt is made to rebalance the tree after a deletion, unless
  a bucket thereby becomes entirely empty.  "Classic BTrees" do
  rebalance, keeping all buckets at least half full (provided there
  are enough keys in the entire tree to fill half a bucket).  The
  tradeoffs are murky.  Pathological cases in the presence of
  deletion do exist.  Pathologies include trees tending toward only
  one key per bucket, and buckets at differing depths (all buckets
  are at the same depth in a classic BTree).

+ DEFAULT_MAX_BUCKET_SIZE and DEFAULT_MAX_BTREE_SIZE are chosen
  mostly to "even out" pickle sizes in storage.  That's why, e.g.,
  an IIBTree has larger values than an OOBTree:  pickles store ints
  more efficiently than they can store arbitrary Python objects.


The BTREE_SEARCH Macro
======================
For notational ease, consider a fixed BTree node x, and let

    K(i) mean x->data.key[i]
    C(i) mean all the keys reachable from x->data.child[i]

For each i in 0 to x->len-1 inclusive,

    K(i) <= C(i) < K(i+1)

is a BTree node invariant, where we pretend that K(0) holds a key
smaller than any possible key, and K(x->len) holds a key larger
than any possible key.  (Note that K(x->len) doesn't actually exist,
and K(0) is never used although space for it exists in non-empty
BTree nodes.)

When searching for a key k, then, the child pointer we want to follow
is the one at index i such that K(i) <= k < K(i+1).  There can be
at most one such i, since the K(i) are strictly increasing.  And there
is at least one such i provided the tree isn't empty (so that 0 < len).
For the moment, assume the tree isn't empty (we'll get back to that
later).

The macro's chief loop invariant is

    K(lo) < k < K(hi)

This holds trivially at the start, since lo is set to 0, and hi to
x->len, and we pretend K(0) is minus infinity and K(len) is plus
infinity.  Inside the loop, if K(i) < k we set lo to i, and if
K(i) > k we set hi to i.  These obviously preserve the invariant.
If K(i) == k, the loop breaks and sets the result to i, and since
K(i) == k in that case i is obviously the correct result.

Other cases depend on how i = floor((lo + hi)/2) works, exactly.
Suppose lo + d = hi for some d >= 0.  Then i = floor((lo + lo + d)/2) =
floor(lo + d/2) = lo + floor(d/2).  So:

a. [d == 0] (lo == i == hi) if and only if (lo   == hi).
b. [d == 1] (lo == i  < hi) if and only if (lo+1 == hi).
c. [d  > 1] (lo  < i  < hi) if and only if (lo+1  < hi).

If the node is empty (x->len == 0), then lo==i==hi==0 at the start,
and the loop exits immediately (the first "i > lo" test fails),
without entering the body.

Else lo < hi at the start, and the invariant K(lo) < k < K(hi) holds.

If lo+1 < hi, we're in case #c:  i is strictly between lo and hi,
so the loop body is entered, and regardless of whether the body sets
the new lo or the new hi to i, the new lo is strictly less than the
new hi, and the difference between the new lo and new hi is strictly
less than the difference between the old lo and old hi.  So long as
the new lo + 1 remains < the new hi, we stay in this case.  We can't
stay in this case forever, though:  because hi-lo decreases on each
trip but remains > 0, lo+1 == hi must eventually become true.  (In
fact, it becomes true quickly, in about log2(x->len) trips; the
point is more that lo doesn't equal hi when the loop ends, it has to
end with lo+1==hi and i==lo).

Then we're in case #b:  i==lo==hi-1 then, and the loop exits.  The
invariant still holds, with lo==i and hi==lo+1==i+1:

    K(i) < k < K(i+1)

so i is again the correct answer.

Optimization points:

+ Division by 2 is done via shift rather via "/2".  These are
  signed ints, and almost all C compilers treat signed int division
  as truncating, and shifting is not the same as truncation for
  signed int division.  The compiler has no way to know these values
  aren't negative, so has to generate longer-winded code for "/2".
  But we know these values aren't negative, and exploit it.

+ The order of _cmp comparisons matters.  We're in an interior
  BTree node, and are looking at only a tiny fraction of all the
  keys that exist.  So finding the key exactly in this node is
  unlikely, and checking _cmp == 0 is a waste of time to the same
  extent.  It doesn't matter whether we check for _cmp < 0 or
  _cmp > 0 first, so long as we do both before worrying about
  equality.

+ At the start of a routine, it's better to run this macro even
  if x->len is 0 (check for that afterwards).  We just called a
  function and so probably drained the pipeline.  If the first thing
  we do then is read up self->len and check it against 0, we just
  sit there waiting for the data to get read up, and then another
  immediate test-and-branch, and for a very unlikely case (BTree
  nodes are rarely empty).  It's better to get into the loop right
  away so the normal case makes progress ASAP.