[Zodb-checkins] CVS: ZODB3/Doc/guide - TODO:1.3 links.tex:1.3 modules.tex:1.3 prog-zodb.tex:1.3 transactions.tex:1.3 zeo.tex:1.3 zodb.dvi:1.3 zodb.tex:1.3
Guido van Rossum
guido@python.org
Fri, 4 Oct 2002 20:37:42 -0400
Update of /cvs-repository/ZODB3/Doc/guide
In directory cvs.zope.org:/tmp/cvs-serv28054/guide
Modified Files:
TODO links.tex modules.tex prog-zodb.tex transactions.tex
zeo.tex zodb.dvi zodb.tex
Log Message:
Merge changes from release branch into trunk.
=== ZODB3/Doc/guide/TODO 1.2 => 1.3 ===
--- ZODB3/Doc/guide/TODO:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/TODO Fri Oct 4 20:37:12 2002
@@ -1,6 +1,4 @@
-Update text to use BTrees, not BTree
Write section on __setstate__
-Connection.sync seems to work now; note this
Continue working on it
Suppress the full GFDL text in the PDF/PS versions
=== ZODB3/Doc/guide/links.tex 1.2 => 1.3 ===
--- ZODB3/Doc/guide/links.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/links.tex Fri Oct 4 20:37:12 2002
@@ -17,6 +17,14 @@
\\
\url{http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html}
+Persistent Programing with ZODB, by Jeremy Hylton and Barry Warsaw:
+\\
+Slides for a tutorial presented at the 10th Python conference. Covers
+much of the same ground as this guide, with more details in some areas
+and less in others.
+\\
+\url{http://www.zope.org/Members/bwarsaw/ipc10-slides}
+
Download link for ZEO: \\
\url{http://www.zope.org/Products/ZEO/}
=== ZODB3/Doc/guide/modules.tex 1.2 => 1.3 ===
--- ZODB3/Doc/guide/modules.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/modules.tex Fri Oct 4 20:37:12 2002
@@ -2,13 +2,12 @@
% Related Modules
% PersistentMapping
% PersistentList
-% BTree
-% Catalog
+% BTrees
\section{Related Modules}
The ZODB package includes a number of related modules that provide
-useful data types such as BTrees or full-text indexes.
+useful data types such as BTrees.
\subsection{\module{ZODB.PersistentMapping}}
@@ -40,51 +39,92 @@
Python lists do.
-\subsection{B-tree Modules}
-
-%here's one: how does one implement searching? i would have expected the
-%btree objects to have ``find key nearest to this'' and ``next'' methods,
-%(like bsddb's set_location)...
-%
-% -- erno
+\subsection{BTrees Package}
When programming with the ZODB, Python dictionaries aren't always what
you need. The most important case is where you want to store a very
large mapping. When a Python dictionary is accessed in a ZODB, the
whole dictionary has to be unpickled and brought into memory. If
you're storing something very large, such as a 100,000-entry user
-database, unpickling such a large object will be slow. B-trees are a
+database, unpickling such a large object will be slow. BTrees are a
balanced tree data structure that behave like a mapping but distribute
-keys throughout a number of tree nodes. Nodes are then only unpickled
-and brought into memory as they're accessed, so the entire tree
-doesn't have to occupy memory (unless you really are touching every
-single key).
-
-There are four different BTree modules provided. One of them, the
-\module{BTree} module, provides the most general data type; the keys
-and values in the B-tree can be any Python object. Some specialized B-tree
-modules require that the keys, and perhaps even the values, to be of a
-certain type, and provide faster performance because of this limitation.
-
-\begin{itemize}
-\item[ \module{IOBTree} ] requires the keys to be integers.
-The module name reminds you of this; the \module{IOBTree} module
-maps Integers to Objects.
-
-\item[ \module{OIBTree} ] requires the values to be integers,
-mapping Objects to Integers.
-
-\item[ \module{IIBTree} ] is strictest, requiring that both keys and values must be integers.
-
-\end{itemize}
-
-To use a B-tree, simply import the desired module and call the
-constructor, always named \function{BTree()}, to get a B-tree
-instance, and then use it like any other mapping:
+keys throughout a number of tree nodes. The nodes are stored in
+sorted order. Nodes are then only unpickled and brought into memory
+as they're accessed, so the entire tree doesn't have to occupy memory
+(unless you really are touching every single key).
+
+The BTrees package provides a large collection of related data
+structures. There are variants of the data structures specialized to
+handle integer values, which are faster and use less memory. There
+are four modules that handle the different variants. The first two
+letters of the module name specify the types of the keys and values in
+mappings -- O for any object and I for integer. The
+\module{BTrees.IOBTree} module provides a mapping that accepts integer
+keys and arbitrary objects as values.
+
+The four data structures provide by each module are a btree, a bucket,
+a tree set, and a set. The btree and bucket types are mappings and
+support all the usual mapping methods, e.g. \function{update()} and
+\function{keys()}. The tree set and set types are similar to mappings
+but they have no values; they support the methods that make sense for
+a mapping with no keys, e.g. \function{keys()} but not
+\function{items()}. The bucket and set types are the individual
+building blocks for btrees and tree sets, respectively. A bucket or
+set can be used when you are sure that it will have few elements. If
+the data structure will grow large, you should use a btree or tree
+set.
+
+The four modules are named \module{OOBTree}, \module{IOBTree},
+\module{OIBTree}, and \module{IIBTree}. The two letter prefixes are
+repeated in the data types names. The \module{BTrees.OOBTree} module
+defines the following types: \class{OOBTree}, \class{OOBucket},
+\class{OOSet}, and \class{OOTreeSet}.
+
+The \function{keys()}, \function{values()}, and \function{items()}
+methods do not materialize a list with all of the data. Instead, they
+return lazy sequences that fetch data from the BTree as needed. They
+also support optional arguments to specify the minium and maximum
+values to return.
+
+A BTree object supports all the methods you would expect of a mapping
+with a few extensions that exploit the fact that the keys are sorted.
+The example below demonstrates how some of the methods work. The
+extra methods re \function{minKey()} and \function{maxKey()}, which
+find the minimum and maximum key value subject to an optional bound
+argument, and \function{byValue()}, which returns value, key pairs
+in reversed sorted order subject to an optional minimum bound argument.
\begin{verbatim}
-import IIBTree
-iimap = IIBTree.BTree()
-iimap[1972] = 27
+>>> from BTrees.OOBTree import OOBTree
+>>> t = OOBTree()
+>>> t.update({ 1: "red", 2: "green", 3: "blue", 4: "spades" })
+>>> len(t)
+4
+>>> t[2]
+'green'
+>>> t.keys()
+<OOBTreeItems object at 0x40269098>
+>>> [k for k in t.keys()] # use a listcomp to get a printable sequence
+[1, 2, 3, 4]
+>>> [k for k in t.values()]
+['red', 'green', 'blue', 'spades']
+>>> [k for k in t.values(1, 2)]
+['red', 'green']
+>>> [k for k in t.values(2)]
+['green', 'blue', 'spades']
+>>> t.byValue("glue") # all values > "glue"
+[('spades', 4), ('red', 1), ('green', 2)]
+>>> t.minKey(1.5)
+2
\end{verbatim}
+
+Each of the modules also defines some functions that operate on
+BTrees -- \function{difference()}, \function{union()}, and
+\function{difference()}. The \function{difference()} function returns
+a bucket, while the other two methods return a set.
+If the keys are integers, then the module also defines
+\function{multiunion()}. If the values are integers, then the module
+also defines \function{weightedIntersection()} and
+\function{weighterUnion()}. The function doc strings describe each
+function briefly.
=== ZODB3/Doc/guide/prog-zodb.tex 1.2 => 1.3 ===
--- ZODB3/Doc/guide/prog-zodb.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/prog-zodb.tex Fri Oct 4 20:37:12 2002
@@ -23,59 +23,21 @@
\subsubsection{Requirements}
-You'll need Python, of course; version 1.5.2 works with some fixes,
-and it also works with Python 2.0, which is what I primarily use.
-
-The code is packaged using Distutils, the new distribution tools for
-Python introduced in Python 2.0. If you're using 1.5.2, first you'll
-have to get the latest Distutils release from the Distutils SIG page
-at \url{http://www.python.org/sigs/distutils-sig/download.html} and
-install it. This is simply a matter of untarring or unzipping the
+You will need Python 2.1 or higher. The code is packaged using
+Distutils. So it is simply a matter of untarring or unzipping the
release package, and then running \code{python setup.py install}.
-If you're using 1.5.2 and have installed previous versions of the
-Distutils, be sure to get the very latest version, since developing
-the ZODB distribution turned up some bugs along the way. If you
-encounter problems compiling \file{ZODB/TimeStamp.c} or your compiler reports
-an error like ``Can't create build/temp.linux2/ExtensionClass.o: No
-such file or directory'', you need an updated version. Old versions of
-Distutils have two bugs which affect the setup scripts. First, for a
-long time the \code{define_macros} keyword in setup.py files didn't work due
-to a Distutils bug, so I hacked TimeStamp.c in earlier releases. The
-Distutils have since been fixed, and the hack became unnecessary, so I
-removed it. Second, the code that creates directories tries to be
-smart and caches them to save time by not trying to create a directory
-twice, but this code was broken in old versions.
-
You'll need a C compiler to build the packages, because there are
various C extension modules. At the moment no one is making Windows
-binaries available, so you'll need a Windows development environment to use the
+binaries available, so you'll need a Windows development environment
+to build ZODB.
\subsubsection{Installing the Packages}
Download the ZODB tarball containing all the packages for both ZODB
-and ZEO from \url{http://www.amk.ca/files/zodb/}.
-
-To build the packages, you must go into the individual directories and
-build and install them one by one. They should be built and installed
-in this order:
-
-\begin{enumerate}
- \item \code{zodb-basic}
- \item ExtensionClass
- \item ZODB
- \item \code{BTree} and \code{BTrees}
- \item ZEO
-\end{enumerate}
-
-In particular, you must install ExtensionClass before building the
-ZODB package; otherwise, the compilation in the ZODB package will die
-complaining that it can't find ExtensionClass.h. You can manually
-hack the \#include path to make it work without installing
-ExtensionClass first, but that's a bit hackish.
-
-If you encounter any problems, please let me know at
-\email{akuchlin@mems-exchange.org}.
+and ZEO from \url{http://www.zope.org/Products/StandaloneZODB}. See
+the \file{README.txt} file in the top level of the release directory
+for details on building, testing, and installing.
\subsection{How ZODB Works}
@@ -113,10 +75,16 @@
\item[Consistency] means that the data cannot be placed into a
logically invalid state; sanity checks can be written and enforced.
-Usually this is done by defining a database schema, and requiring the
-data always matches the schema. For example, this might enforce that
-the \code{order_number} attribute is always an integer, and not a
-string, tuple, or other object.
+Usually this is done by defining a database schema, and requiring
+the data always matches the schema. There are two typical
+approaches to consistency. One is to enforce rules about the types
+of objects and attribute; for example, enforce that the
+\code{order_number} attribute is always an integer, and not a
+string, tuple, or other object. Another is to guarantee consistency
+across data structures; for example, that any object with an
+\code{order_number} attribute must also appear in the
+\code{orders_table} object. In general, atomicity and isolation make
+it possible for applications to provide consistency.
\item[Isolation] means that two programs or threads running in two
different transactions cannot see each other's changes until they
@@ -144,7 +112,7 @@
storing and retrieving objects from some form of long-term storage.
A few different types of Storage have been written, such as
\class{FileStorage}, which uses regular disk files, and
- \class{BerkeleyStorage}, which uses Sleepycat Software's BerkeleyDB
+ \class{bsddb3Storage}, which uses Sleepycat Software's BerkeleyDB
database. You could write a new Storage that stored objects in a
relational database or Metakit file, for example, if that would
better suit your application. Two example storages,
@@ -156,7 +124,7 @@
created per process.
\item Finally, the \class{Connection} class caches objects, and moves
- them into and out of object storage. A multi-threaded program can
+ them into and out of object storage. A multi-threaded program should
open a separate \class{Connection} instance for each thread.
Different threads can then modify objects and commit their
modifications independently.
@@ -199,6 +167,10 @@
correctly, since the ZODB code does some magical tricks with
importing.
+The \class{Persistent} base class is an \module{ExtensionClass}
+class. As a result, it not compatible with new-style classes or types
+in Python 2.2 and up.
+
For simplicity, in the examples the \class{User} class will
simply be used as a holder for a bunch of attributes. Normally the
class would define various methods that add functionality, but that
@@ -304,11 +276,6 @@
arithmetic operations: \method{__radd__}, \method{__rsub__}, and so
forth.
-\item Python's built-in \function{isinstance()} and \function{issubclass()}
-functions don't work properly on ExtensionClasses. Solution: use
-custom \function{isinstance()} and \function{issubclass()} functions
-that handle ExtensionClasses correctly.
-
\item Recent versions of the ZODB allow writing a class with
\method{__setattr__} , \method{__getattr__}, or \method{__delattr__} methods. (Older versions didn't support this at all.)
If you write such a \method{__setattr__} or \method{__delattr__} method,
@@ -414,100 +381,6 @@
no attempt is ever made to call \method{__cmp__}. Perhaps Python 2.2
will repair this.
-\subsubsection{Fixing \function{isinstance} and \function{issubclass}}
-
-Python's built-in functions
-\function{isinstance()} and \function{issubclass} don't
-work on ExtensionClass instances, for much the same reason that
-\method{__cmp__} is never called; in some bits of the Python core code,
-branches are taken only if an object is of the \class{InstanceType}
-type, and this can never be true for an ExtensionClass instance.
-Python 2.1 tried to fix this, and changed these functions slightly in
-an effort to make them work for ExtensionClasses; unfortunately, the
-changes didn't work.
-
-The solution is to use customized versions of these functions that
-handle ExtensionClasses specially and fall back to the built-in
-version otherwise. Here are the versions we've written at the MEMS Exchange:
-
-\begin{verbatim}
-# The built-in 'isinstance()' and 'issubclass()' won't work on
-# ExtensionClasses, so you have to use the versions supplied here.
-# (But those versions work fine on regular instances and classes too,
-# so you should *always* use them.)
-
-def issubclass (class1, class2):
- """A version of 'issubclass' that works with extension classes
- as well as regular Python classes.
- """
-
- # Both class objects are regular Python classes, so use the
- # built-in 'issubclass()'.
- if type(class1) is ClassType and type(class2) is ClassType:
- return __builtin__.issubclass(class1, class2)
-
- # Both so-called class objects have a '__bases__' attribute: ie.,
- # they aren't regular Python classes, but they sure look like them.
- # Assume they are extension classes and reimplement what the builtin
- # 'issubclass()' does behind the scenes.
- elif hasattr(class1, '__bases__') and hasattr(class2, '__bases__'):
- # XXX it appears that "ec.__class__ is type(ec)" for an
- # extension class 'ec': could we/should we use this as an
- # additional check for extension classes?
-
- # Breadth-first traversal of class1's superclass tree. Order
- # doesn't matter because we're just looking for a "yes/no"
- # answer from the tree; if we were trying to resolve a name,
- # order would be important!
- stack = [class1]
- while stack:
- if stack[0] is class2:
- return 1
- stack.extend(list(stack[0].__bases__))
- del stack[0]
- else:
- return 0
-
- # Not a regular class, not an extension class: blow up for consistency
- # with builtin 'issubclass()"
- else:
- raise TypeError, "arguments must be class or ExtensionClass objects"
-
-# issubclass ()
-
-def isinstance (object, klass):
- """A version of 'isinstance' that works with extension classes
- as well as regular Python classes."""
-
- if type(klass) is TypeType:
- return __builtin__.isinstance(object, klass)
- elif hasattr(object, '__class__'):
- return issubclass(object.__class__, klass)
- else:
- return 0
-\end{verbatim}
-
-I'd recommend putting these functions in a module that always gets
-imported. The convention on my work project is to put them in
-\file{mems/lib/base.py}, which contains various fundamental classes
-and functions for our system, and access them like this:
-
-\begin{verbatim}
-from mems.lib import base
-...
-if base.isinstance(object, Class): ...
-\end{verbatim}
-
-Don't insert the modified functions into Python's
-\module{__builtin__} module, or import just the
-\function{isinstance()} and \function{issubclass} functions.
-If you consistently use \function{base.isinstance()}, then forgetting
-to import the \module{base} module will result in a
-\exception{NameError} exception. In the
-case of a forgotten import, calling the functions directly would use
-Python's built-in versions, leading to subtle bugs that might not be
-noticed for some time.
-
\subsubsection{\method{__getattr__}, \method{__delattr__}, and \method{__setattr__}}
Recent versions of ZODB allow writing persistent classes that have
@@ -545,7 +418,7 @@
of object references is quite structured, making it easy to find all
the instances of the class being modified. For example, if all
\class{User} objects can be found inside a single dictionary or
-B-tree, then it would be a simple matter to loop over every
+BTree, then it would be a simple matter to loop over every
\class{User} instance with a \keyword{for} statement.
This is more difficult if your object graph is less structured; if
\class{User} objects can be found as attributes of any number of
=== ZODB3/Doc/guide/transactions.tex 1.2 => 1.3 ===
--- ZODB3/Doc/guide/transactions.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/transactions.tex Fri Oct 4 20:37:12 2002
@@ -35,7 +35,7 @@
\begin{verbatim}
# Commit a subtransaction
-get_transaction().commit(1)
+get_transaction().commit(1)
# Abort a subtransaction
get_transaction().abort(1)
@@ -103,6 +103,16 @@
modified the objects affected by the transaction you're trying to
undo.
+After you call \method{undo()} you must commit the transaction for the
+undo to actually be applied.
+\footnote{There are actually two different ways a storage can
+implement the undo feature. Most of the storages that ship with ZODB
+use the transactional form of undo described in the main text. Some
+storages may use a non-transactional undo makes changes visible
+immediately.} There is one glitch in the undo process. The thread
+that calls undo may not see the changes to the object until it calls
+\method{Connection.sync()} or commits another transaction.
+
\subsection{Versions}
While many subtransactions can be contained within a single regular
@@ -160,7 +170,4 @@
The \class{Storage} and \class{DB} instances can be shared among
several threads, as long as individual \class{Connection} instances
are created for each thread.
-
-XXX I can't think of anything else to say about multithreaded ZODB
-programs. Suggestions? An example program?
=== ZODB3/Doc/guide/zeo.tex 1.2 => 1.3 ===
--- ZODB3/Doc/guide/zeo.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/zeo.tex Fri Oct 4 20:37:12 2002
@@ -18,22 +18,21 @@
The combination of ZEO and ZODB is essentially a Python-specific
object database.
-ZEO consists of about 1400 lines of Python code. The code is
-relatively small because it contains only code for a TCP/IP server,
-and for a new type of Storage, \class{ClientStorage}.
-\class{ClientStorage} doesn't use disk files at all; it simply
-makes remote procedure calls to the server, which then passes them on
-a regular \class{Storage} class such as \class{FileStorage}. The
-following diagram lays out the system:
+ZEO consists of about 6000 lines of Python code, excluding tests. The
+code is relatively small because it contains only code for a TCP/IP
+server, and for a new type of Storage, \class{ClientStorage}.
+\class{ClientStorage} simply makes remote procedure calls to the
+server, which then passes them on a regular \class{Storage} class such
+as \class{FileStorage}. The following diagram lays out the system:
XXX insert diagram here later
Any number of processes can create a \class{ClientStorage}
instance, and any number of threads in each process can be using that
instance. \class{ClientStorage} aggressively caches objects
-locally, so in order to avoid using stale data, the ZEO server sends
-an invalidate message to all the connected \class{ClientStorage}
-instances on every write operation. The invalidate message contains
+locally, so in order to avoid using stale data. The ZEO server sends
+an invalidation message to all the connected \class{ClientStorage}
+instances on every write operation. The invalidation message contains
the object ID for each object that's been modified, letting the
\class{ClientStorage} instances delete the old data for the
given object from their caches.
@@ -46,7 +45,9 @@
applications. If every \class{ClientStorage} is writing to the
database all the time, this will result in a storm of invalidate
messages being sent, and this might take up more processing time than
-the actual database operations themselves.
+the actual database operations themselves.\footnote{These messages are
+small and sent in batches, so there would need to be a lot of writes
+before it became a problem.}
On the other hand, for applications that have few writes in comparison
to the number of read accesses, this aggressive caching can be a major
@@ -69,36 +70,17 @@
\subsubsection{Requirements}
-To run a ZEO server, you'll need Python 1.5.2 or 2.0, and the ZODB
-packages from \url{http://www.amk.ca/files/zodb/}
-have to be installed.
-
-\emph{Note for Python 1.5.2 users}: ZEO requires updated versions
-of the \module{asyncore.py} and \module{asynchat.py} modules that are
-included in 1.5.2's standard library. Current versions of the ZODB
-distribution install private versions of these modules, so you
-shouldn't need to grab updated versions yourself. (The symptom of
-this problem is a traceback on attempting to run a ZEO client program:
-the traceback is ``TypeError: too many arguments; expected 2, got 3''
-around line 100 of \file{smac.py}.
-
-\subsubsection{Installation}
-
-Installing the ZEO package is easy. Just run \code{python setup.py
-install}. This will install the ZEO/ package into your Python
-installation, and copy various files into their proper locations:
-\file{zeo.conf} will be put into \file{/usr/local/etc/}, a \file{zeo} startup
-script will be put in \file{/etc/rc.d/init.d/}, and the \file{zeod}
-daemon program will be placed in \file{/usr/local/bin}.
-
-\subsection{Configuring and Running a ZEO Server}
-
-Edit \code{/usr/local/etc/zeo.conf} appropriately for your desired
-setup. This configuration file controls the port on which ZEO will
-listen for connections, the user and group IDs under which the server
-will be executed, and the location of the concrete \class{Storage}
-object that will be made network-accessible.
-
+The ZEO server software is included in ZODB3. As with the rest of
+ZODB3, you'll need Python 2.1 or higher.
+
+\subsubsection{Running a server}
+
+The start.py script in the ZEO directory can be used to start a
+server. Run it with the -h option to see the various values. If
+you're just experimenting, a good choise is to use
+\code{python ZEO/start.py -D -U /tmp/zeosocket} to run ZEO in
+debug mode and with a Unix domain socket.
+
\subsection{Testing the ZEO Installation}
Once a ZEO server is up and running, using it is just like using ZODB
@@ -137,15 +119,15 @@
\subsection{ZEO Programming Notes}
-XXX The Connection.sync() method and its necessity (if it works at all!)
-
-% That doesn't work. I tested it. sync() doesn't seem to get into
-% the asyncore loop. One of us should probably look into adding an
-% API for this when we have some free time. It would be a nice
-% small project that would get into ZODB's guts.
-
-
-
+ZEO is written using \module{asyncore}, from the Python standard
+library. It assumes that some part of the user application is running
+an \module{asyncore} mainloop. For example, Zope run the loop in a
+separate thread and ZEO uses that. If your application does not have
+a mainloop, ZEO will not process incoming invalidation messages until
+you make some call into ZEO. The \method{Connection.sync} method can
+be used to process pending invalidation messages. You can call it
+when you want to make sure the \class{Connection} has the most recent
+version of every object, but you don't have any other work for ZEO to do.
\subsection{Sample Application: chatter.py}
@@ -174,7 +156,7 @@
def __init__(self, name):
self.name = name
# Internal attribute: _messages holds all the chat messages.
- self._messages = BTree.BTree()
+ self._messages = BTrees.OOBTree.OOBTree()
\end{verbatim}
\method{add_message()} has to add a message to the
=== ZODB3/Doc/guide/zodb.dvi 1.2 => 1.3 ===
=== ZODB3/Doc/guide/zodb.tex 1.2 => 1.3 ===
--- ZODB3/Doc/guide/zodb.tex:1.2 Mon Feb 11 18:33:40 2002
+++ ZODB3/Doc/guide/zodb.tex Fri Oct 4 20:37:12 2002
@@ -1,7 +1,7 @@
\documentclass{howto}
\title{ZODB/ZEO Programming Guide}
-\release{0.03}
+\release{0.04}
\date{\today}
\author{A.M.\ Kuchling}