[Zodb-checkins] CVS: ZODB3/Doc/guide - modules.tex:1.2.4.1
Jeremy Hylton
jeremy@zope.com
Fri, 4 Oct 2002 13:36:12 -0400
Update of /cvs-repository/ZODB3/Doc/guide
In directory cvs.zope.org:/tmp/cvs-serv12488/Doc/guide
Modified Files:
Tag: ZODB3-3_1-branch
modules.tex
Log Message:
Flesh out section on BTrees.
Remove reference to Catalog.
=== ZODB3/Doc/guide/modules.tex 1.2 => 1.2.4.1 ===
--- ZODB3/Doc/guide/modules.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/modules.tex Fri Oct 4 13:36:11 2002
@@ -2,13 +2,12 @@
% Related Modules
% PersistentMapping
% PersistentList
-% BTree
-% Catalog
+% BTrees
\section{Related Modules}
The ZODB package includes a number of related modules that provide
-useful data types such as BTrees or full-text indexes.
+useful data types such as BTrees.
\subsection{\module{ZODB.PersistentMapping}}
@@ -40,51 +39,92 @@
Python lists do.
-\subsection{B-tree Modules}
-
-%here's one: how does one implement searching? i would have expected the
-%btree objects to have ``find key nearest to this'' and ``next'' methods,
-%(like bsddb's set_location)...
-%
-% -- erno
+\subsection{BTrees Package}
When programming with the ZODB, Python dictionaries aren't always what
you need. The most important case is where you want to store a very
large mapping. When a Python dictionary is accessed in a ZODB, the
whole dictionary has to be unpickled and brought into memory. If
you're storing something very large, such as a 100,000-entry user
-database, unpickling such a large object will be slow. B-trees are a
+database, unpickling such a large object will be slow. BTrees are a
balanced tree data structure that behave like a mapping but distribute
-keys throughout a number of tree nodes. Nodes are then only unpickled
-and brought into memory as they're accessed, so the entire tree
-doesn't have to occupy memory (unless you really are touching every
-single key).
-
-There are four different BTree modules provided. One of them, the
-\module{BTree} module, provides the most general data type; the keys
-and values in the B-tree can be any Python object. Some specialized B-tree
-modules require that the keys, and perhaps even the values, to be of a
-certain type, and provide faster performance because of this limitation.
-
-\begin{itemize}
-\item[ \module{IOBTree} ] requires the keys to be integers.
-The module name reminds you of this; the \module{IOBTree} module
-maps Integers to Objects.
-
-\item[ \module{OIBTree} ] requires the values to be integers,
-mapping Objects to Integers.
-
-\item[ \module{IIBTree} ] is strictest, requiring that both keys and values must be integers.
-
-\end{itemize}
-
-To use a B-tree, simply import the desired module and call the
-constructor, always named \function{BTree()}, to get a B-tree
-instance, and then use it like any other mapping:
+keys throughout a number of tree nodes. The nodes are stored in
+sorted order. Nodes are then only unpickled and brought into memory
+as they're accessed, so the entire tree doesn't have to occupy memory
+(unless you really are touching every single key).
+
+The BTrees package provides a large collection of related data
+structures. There are variants of the data structures specialized to
+handle integer values, which are faster and use less memory. There
+are four modules that handle the different variants. The first two
+letters of the module name specify the types of the keys and values in
+mappings -- O for any object and I for integer. The
+\module{BTrees.IOBTree} module provides a mapping that accepts integer
+keys and arbitrary objects as values.
+
+The four data structures provide by each module are a btree, a bucket,
+a tree set, and a set. The btree and bucket types are mappings and
+support all the usual mapping methods, e.g. \function{update()} and
+\function{keys()}. The tree set and set types are similar to mappings
+but they have no values; they support the methods that make sense for
+a mapping with no keys, e.g. \function{keys()} but not
+\function{items()}. The bucket and set types are the individual
+building blocks for btrees and tree sets, respectively. A bucket or
+set can be used when you are sure that it will have few elements. If
+the data structure will grow large, you should use a btree or tree
+set.
+
+The four modules are named \module{OOBTree}, \module{IOBTree},
+\module{OIBTree}, and \module{IIBTree}. The two letter prefixes are
+repeated in the data types names. The \module{BTrees.OOBTree} module
+defines the following types: \class{OOBTree}, \class{OOBucket},
+\class{OOSet}, and \class{OOTreeSet}.
+
+The \function{keys()}, \function{values()}, and \function{items()}
+methods do not materialize a list with all of the data. Instead, they
+return lazy sequences that fetch data from the BTree as needed. They
+also support optional arguments to specify the minium and maximum
+values to return.
+
+A BTree object supports all the methods you would expect of a mapping
+with a few extensions that exploit the fact that the keys are sorted.
+The example below demonstrates how some of the methods work. The
+extra methods re \function{minKey()} and \function{maxKey()}, which
+find the minimum and maximum key value subject to an optional bound
+argument, and \function{byValue()}, which returns value, key pairs
+in reversed sorted order subject to an optional minimum bound argument.
\begin{verbatim}
-import IIBTree
-iimap = IIBTree.BTree()
-iimap[1972] = 27
+>>> from BTrees.OOBTree import OOBTree
+>>> t = OOBTree()
+>>> t.update({ 1: "red", 2: "green", 3: "blue", 4: "spades" })
+>>> len(t)
+4
+>>> t[2]
+'green'
+>>> t.keys()
+<OOBTreeItems object at 0x40269098>
+>>> [k for k in t.keys()] # use a listcomp to get a printable sequence
+[1, 2, 3, 4]
+>>> [k for k in t.values()]
+['red', 'green', 'blue', 'spades']
+>>> [k for k in t.values(1, 2)]
+['red', 'green']
+>>> [k for k in t.values(2)]
+['green', 'blue', 'spades']
+>>> t.byValue("glue") # all values > "glue"
+[('spades', 4), ('red', 1), ('green', 2)]
+>>> t.minKey(1.5)
+2
\end{verbatim}
+
+Each of the modules also defines some functions that operate on
+BTrees -- \function{difference()}, \function{union()}, and
+\function{difference()}. The \function{difference()} function returns
+a bucket, while the other two methods return a set.
+If the keys are integers, then the module also defines
+\function{multiunion()}. If the values are integers, then the module
+also defines \function{weightedIntersection()} and
+\function{weighterUnion()}. The function doc strings describe each
+function briefly.