[Zodb-checkins] CVS: ZODB3/Doc/guide - modules.tex:1.2.4.1

Jeremy Hylton jeremy@zope.com
Fri, 4 Oct 2002 13:36:12 -0400


Update of /cvs-repository/ZODB3/Doc/guide
In directory cvs.zope.org:/tmp/cvs-serv12488/Doc/guide

Modified Files:
      Tag: ZODB3-3_1-branch
	modules.tex 
Log Message:
Flesh out section on BTrees.
Remove reference to Catalog.





=== ZODB3/Doc/guide/modules.tex 1.2 => 1.2.4.1 ===
--- ZODB3/Doc/guide/modules.tex:1.2	Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/modules.tex	Fri Oct  4 13:36:11 2002
@@ -2,13 +2,12 @@
 % Related Modules
 %    PersistentMapping
 %    PersistentList
-%    BTree
-%    Catalog
+%    BTrees
 
 \section{Related Modules}
 
 The ZODB package includes a number of related modules that provide
-useful data types such as BTrees or full-text indexes.
+useful data types such as BTrees.
 
 \subsection{\module{ZODB.PersistentMapping}}
 
@@ -40,51 +39,92 @@
 Python lists do.
 
 
-\subsection{B-tree Modules}
-
-%here's one: how does one implement searching? i would have expected the
-%btree objects to have ``find key nearest to this'' and ``next'' methods,
-%(like bsddb's set_location)...
-%
-%  -- erno
+\subsection{BTrees Package}
 
 When programming with the ZODB, Python dictionaries aren't always what
 you need.  The most important case is where you want to store a very
 large mapping.  When a Python dictionary is accessed in a ZODB, the
 whole dictionary has to be unpickled and brought into memory.  If
 you're storing something very large, such as a 100,000-entry user
-database, unpickling such a large object will be slow.  B-trees are a
+database, unpickling such a large object will be slow.  BTrees are a
 balanced tree data structure that behave like a mapping but distribute
-keys throughout a number of tree nodes.  Nodes are then only unpickled
-and brought into memory as they're accessed, so the entire tree
-doesn't have to occupy memory (unless you really are touching every
-single key).
-
-There are four different BTree modules provided.  One of them, the
-\module{BTree} module, provides the most general data type; the keys
-and values in the B-tree can be any Python object.  Some specialized B-tree
-modules require that the keys, and perhaps even the values, to be of a
-certain type, and provide faster performance because of this limitation.
-
-\begin{itemize}
-\item[ \module{IOBTree} ] requires the keys to be integers.
-The module name reminds you of this; the \module{IOBTree} module
-maps Integers to Objects.
-
-\item[ \module{OIBTree} ] requires the values to be integers,
-mapping Objects to Integers.
-
-\item[ \module{IIBTree} ] is strictest, requiring that both keys and values must be integers.
-
-\end{itemize}
-
-To use a B-tree, simply import the desired module and call the
-constructor, always named \function{BTree()}, to get a B-tree
-instance, and then use it like any other mapping:
+keys throughout a number of tree nodes.  The nodes are stored in
+sorted order.  Nodes are then only unpickled and brought into memory
+as they're accessed, so the entire tree doesn't have to occupy memory
+(unless you really are touching every single key).
+
+The BTrees package provides a large collection of related data
+structures.  There are variants of the data structures specialized to
+handle integer values, which are faster and use less memory.  There
+are four modules that handle the different variants.  The first two
+letters of the module name specify the types of the keys and values in
+mappings -- O for any object and I for integer.  The
+\module{BTrees.IOBTree} module provides a mapping that accepts integer
+keys and arbitrary objects as values.
+
+The four data structures provide by each module are a btree, a bucket,
+a tree set, and a set.  The btree and bucket types are mappings and
+support all the usual mapping methods, e.g. \function{update()} and
+\function{keys()}.  The tree set and set types are similar to mappings
+but they have no values; they support the methods that make sense for
+a mapping with no keys, e.g. \function{keys()} but not
+\function{items()}.  The bucket and set types are the individual
+building blocks for btrees and tree sets, respectively.  A bucket or
+set can be used when you are sure that it will have few elements.  If
+the data structure will grow large, you should use a btree or tree
+set.
+
+The four modules are named \module{OOBTree}, \module{IOBTree},
+\module{OIBTree}, and \module{IIBTree}.  The two letter prefixes are
+repeated in the data types names.  The \module{BTrees.OOBTree} module
+defines the following types: \class{OOBTree}, \class{OOBucket},
+\class{OOSet}, and \class{OOTreeSet}.
+
+The \function{keys()}, \function{values()}, and \function{items()}
+methods do not materialize a list with all of the data.  Instead, they
+return lazy sequences that fetch data from the BTree as needed.  They
+also support optional arguments to specify the minium and maximum
+values to return.
+
+A BTree object supports all the methods you would expect of a mapping
+with a few extensions that exploit the fact that the keys are sorted.
+The example below demonstrates how some of the methods work.  The
+extra methods re \function{minKey()} and \function{maxKey()}, which
+find the minimum and maximum key value subject to an optional bound
+argument, and \function{byValue()}, which returns value, key pairs
+in reversed sorted order subject to an optional minimum bound argument.
 
 \begin{verbatim}
-import IIBTree
-iimap = IIBTree.BTree()
-iimap[1972] = 27
+>>> from BTrees.OOBTree import OOBTree
+>>> t = OOBTree()
+>>> t.update({ 1: "red", 2: "green", 3: "blue", 4: "spades" })
+>>> len(t)
+4
+>>> t[2]
+'green'
+>>> t.keys()
+<OOBTreeItems object at 0x40269098>
+>>> [k for k in t.keys()] # use a listcomp to get a printable sequence
+[1, 2, 3, 4]
+>>> [k for k in t.values()]
+['red', 'green', 'blue', 'spades']
+>>> [k for k in t.values(1, 2)]
+['red', 'green']
+>>> [k for k in t.values(2)]
+['green', 'blue', 'spades']
+>>> t.byValue("glue") # all values > "glue"
+[('spades', 4), ('red', 1), ('green', 2)]
+>>> t.minKey(1.5)
+2
 \end{verbatim}
+
+Each of the modules also defines some functions that operate on
+BTrees -- \function{difference()}, \function{union()}, and
+\function{difference()}.  The \function{difference()} function returns
+a bucket, while the other two methods return a set.
+If the keys are integers, then the module also defines
+\function{multiunion()}.  If the values are integers, then the module
+also defines \function{weightedIntersection()} and
+\function{weighterUnion()}.  The function doc strings describe each
+function briefly.