[Zodb-checkins] CVS: Packages/bsddb3Storage - README:1.1
barry@digicool.com
barry@digicool.com
Fri, 27 Apr 2001 18:59:58 -0400 (EDT)
Update of /cvs-repository/Packages/bsddb3Storage
In directory korak:/tmp/cvs-serv29258
Added Files:
README
Log Message:
Initial README file for the Berkeley storages. Some of the
information is taken from PacklessReadme.txt, but genericized for use
with any of the storages.
--- Added File README in package Packages/bsddb3Storage ---
Berkeley (bsddb3) Storage 1.0 beta 2 for ZODB/Zope
Introduction
This package contains implementations for Zope/ZODB storages based
on Sleepycat Software's BerkeleyDB 3.x and the PyBSDDB3 extension.
These storages save ZODB data to some number of BerkeleyDB tables,
relying on Berkeley's transaction machinery to provide reliability
and recoverability.
Contents
Inside the bsddb3Storage package, there are three storage
implementations:
- Packless.py is an implementation of an undo-less, version-less
storage that obviates the need for packing, except in the face
of cyclic garbage. It uses a reference counting garbage
collection strategy to clean up garbage objects in the normal
case. This storage was released in 1.0 beta 1.
- Minimal.py is a new implementation of an undo-less, version-less
storage, however it currently only does reference counting
garbage collection on the call to pack(). Thus it is not (yet)
packless. However, because Minimal shares code with the Full
storage, and provides a more robust temporary commit log, this
is the wave of the future. For 1.0 final release, Minimal will
be augmented to provide packless reclamation of reference
counted garbage, and will be integrated with automatic cyclic
garbage detection.
- Full.py is a complete storage implementation, supporting undo,
versions, automatic reference counting garbage collection.
Packing this storage is only required in order to get rid of old
object revisions. For the 1.0 final release, Full will be
integrated with automatic cyclic garbage detection.
Note that Full.py supports a new style of undo, called
"transactional undo", which is really a non-destructive,
redo-able undo. For more information on transactional undo, see
http://www.zope.org/Wikis/ZODB/TransactionalUndo
Old style undo is /not/ supported in the Full.py storage, so
unless you have a newer version of ZODB, you will not be able to
perform undos. It is unlikely that old style undo support will
be added to Full.py storage.
Compatibility
The Full and Minimal storages have been tested with Python 2.0 and
2.1, but not with Python 1.5.2. It's possible they work with
Python 1.5.2, but it's also not likely that Python 1.5.2 will be
explicitly supported. Full storage has been tested with Zope
2.3.1 (running Python 2.1) and should work, except for undo as
described above. Transactional undo support will work with Zope
2.4. These storages have only been tested on Linux, and should
work for on Unix-like operating system. Windows has been
partially tested, so it should work.
Requirements
You must install Sleepcat BerkeleyDB 3.x and Robin Dunn's PyBSDDB
package separately.
To obtain the latest source release of BerkeleyDB 3.x, see
http://www.sleepycat.com. As of this writing the latest
BerkeleyDB release is 3.2.9, and this is the version these
storages have been tested with. Before using BerkeleyDB, be sure
that you comply with its licensing requirements:
http://www.sleepycat.com/licensing.html
To obtain the latest source release of Robin Dunn's PyBSDDB
package, see http://pybsddb.sourceforge.net
Install both BerkeleyDB and PyBSDDB as per the instructions which
they come with. For BerkeleyDB, it's generally wise to accept the
default "configure" options and do a "make install" as root.
PyBSDDB comes with a distutils-based setup script which should
allow you to place the package in a globally accessible directory
which is in your PYTHONPATH (e.g. "site-packages/bsddb3").
When you can run the tests which ship with PyBSDDB, you'll know
you've been successful at both BerkeleyDB and PyBSDDB
installation.
Installing bsddb3Storage
The bsddb3Storage is distributed as a Python distutils package, so
the simplest thing to do is to use distutils to install it:
% python setup.py install
Then you should be able to do this at the prompt:
Python 2.1 (#1, Apr 17 2001, 23:30:09)
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import bsddb3Storage
>>> bsddb3Storage.__version__
'1.0 beta 2'
See also docs/PacklessReadme.txt for alternative installation
directions.
Using bsddb3Storage
By default, Zope uses a FileStorage to hold ZODB data. To tell
Zope to use an alternate storage such as Packless, you need to set
up a custom_zodb.py file.
There is a sample custom_zodb.py file in the docs/ subdirectory,
shipped with this release. The easiest way to started get with
one of the Berkeley storages is to copy custom_zodb.py file to
your SOFTWARE_HOME directory (your main Zope dir) and edit its
contents to specify which storage you want to use. If you use an
INSTANCE_HOME setup, you'll want to copy the file to the
INSTANCE_HOME directory instead and do the same.
If you choose to edit the contents of the custom_zodb.py file, you
can change the "env" string to point to a different "environment"
directory for BerkeleyDB. BerkeleyDB needs its own working
directory (which it calls an environment) into which it will store
its support tables and log files. The contents of this directory
can become quite large, even if your data needs are relatively
modest (see "BerkeleyDB Log Files" below).
By default, the environment path is set in custom_zodb.py to a
subdirectory of your Zope's var subdirectory named bsddb3Storage.
You may change this to any absolute or relative path to which the
user which runs the Zope executable has write privileges. If the
environment directory doesn't exist, it will be created when you
first run Zope with one of the storages. It is recommended that
you choose an environment directory which does not contain any
other files. Additionally, you should not use BerkeleyDB on
remotely mounted filesystems such as NFS.
Use with ZEO
The Berkeley storages are compatible with ZEO. For general
information on how to use alternate storage implementations with
ZEO, see the "start.txt" file in the ZEO release documentation.
BerkeleyDB Files
After Zope is started with one of the Berkeley storages, you will
see a number of different types of files in your BerkeleyDB
environment directory. There will be a number of "__db*" files, a
number of "log.*" files, and several files which have the prefix
"zodb_". The files which have the zodb_ prefix are the actual
BerkeleyDB databases which hold the storage data. The "log.*"
files are write-ahead logs for BerkeleyDB transactions, and they
are very important. The "__db*" files are working files for
BerkeleyDB, and they are less important. It's wise to back up all
the files in this directory regularly. BerkeleyDB supports
"hot-backup". Log files need to be archived and cleared on a
regular basis (a following section covers this).
You may also occasionally see some files with names that are long
strings of hexadecimal digits. These are "commit log" temporary
files, created by the Full or Minimal storages, which are used to
buffer database modifications during the two-phase commit process
(don't confuse these with BerkeleyDB log files).
Because of the semantics of BerkeleyDB's transactions, it is
necessary to store the changes in this temporary file until a
BerkeleyDB transaction can be committed. Under normal operation,
the long hex-digit files should only exist during a ZODB
transaction. However, if some fatal error occurs during a ZODB
transaction which is neither committed nor aborted, the
uncommitted changes will reside in this file. Your storage will
then refuse to allow new changes to the database until a recovery
process is run (this is not the BerkeleyDB recovery process, and
unfortunately the recover script has not yet been written; it will
for the 1.0 final release).
BerkeleyDB Log Files
BerkeleyDB is a transactional database system. In order to
maintain transactional integrity, BerkeleyDB writes data to "log
files" before the data is committed. These log files live in the
BerkeleyDB "environment" directory unless you take steps to
configure your BerkeleyDB environment differently. BerkeleyDB log
files can become quite large, as well, so it may be necessary to
place them on a separate partition with lots of free disk space.
The log file directory can be changed by creating a file named
'DB_CONFIG' in the BerkeleyStorage "environment" directory you've
chosen within 'custom_zodb.py', customizing the following
content:
set_lg_dir /the/path/to/the/log/file/directory
After using this configuration file to redirect logfile placement,
your actual database files will still be kept in the directory
specified by the "env" setting of your custom_zodb.py; only the
BerkeleyDB log files will be written to the directory you specify
in 'DB_CONFIG'.
Redirecting log files to a directory other than your environment
directory may improve recoverability in the case of BerkeleyDB
failure. It may also improve performance if the log file
directory is on a separate hard disk/controller combination. For
more information about BerkeleyDB log files, recoverability and
why it is advantageous to put your log files and your database
files on separate devices, see
http://www.sleepycat.com/docs/ref/transapp/reclimit.html.
Setting BerkeleyDB Maximum Locks
ZODB transactions can be of almost arbitrary sizes (actually, they
"top out" at a total size of 2GB). BerkeleyDB is configured to
use 500 locks by default. Larger transactions in BerkeleyDB
require more locks. Thus, Packless ships with the default number
of Berkeley locks set to 10,000 (BAW: is this still the case, and
what about Full and Minimal?). This should allow almost any Zope
transaction to commit at the expense of increased RAM consumption.
Utilizing 10,000 locks requires (at least on Linux systems)
approximately 3MB of RAM overhead, perhaps little of which is
being actually used in environments which do not commit large
transactions. You can reduce RAM consumption by manually sizing
BerkeleyDB locking.
To manually size locking, create (or edit) the file DB_CONFIG
within the BerkeleyDB "environment" directory you've chosen in
custom_zodb.py, adding the following directives to the DB_CONFIG
file:
set_lk_max_locks 500
set_lk_max_objects 500
set_lk_max_lockers 3
Change the integers as necessary. When one of the Berkeley
storages starts up, the Berkeley directives supplied in DB_CONFIG
will override the defaults.
Precision-sizing BerkeleyDB locking is a site-dependent task.
Sleepycat recommends that you run the "db_stat -c" command against
the database environment to see what the "max number of locks,
lock objects and lockers so far" numbers are during highly
stressful operations, multiply each of those numbers by 2, and
provide the multiplied-by-2 numbers as arguments to
set_lk_max_locks, set_lk_max_objects, and set_lk_max_lockers
respectively in DB_CONFIG. For detailed BerkeleyDB locking sizing
strategy, see http://www.sleepycat.com/docs/ref/lock/max.html.
Archival and Maintenance
Log file rotation for Berkeley DB is closely related to database
archival.
BerkeleyDB never deletes "old" log files. Eventually, if you do
not maintain your Berkeley database by deleting "old" log files,
you will run out of disk space. It's necessary to maintain and
archive your BerkeleyDB files as per the procedures outlined in
http://pybsddb.sourceforge.net/ref/transapp/archival.html.
It is advantageous to automate this process, perhaps by creating a
script run by "cron" that makes use of the "db_archive" executable
as per the referenced document. One strategy might be to perform
the following sequence of operations:
- shut down the process which is using BerkeleyDB (Zope or the ZEO
storage server).
- back up the database files (the files prefixed with "zodb").
- back up all existing BerkeleyDB log files (the files prefixed
"log").
- run "db_archive -h /the/environment/directory" against your
environment directory to find out which log files are no longer
participating in transactions (they will be printed to stdout
one file per line).
- delete the log files that were reported by "db_archive" as no
longer participating in any transactions.
"Hot" backup and rotation of log files is slightly different. See
the above-referenced link regarding archival for more information.
Disaster Recovery
To recover from an out-of-disk-space error on the log file
partition, or another recoverable failure which causes the storage
to raise a fatal exception, you may need to use the BerkeleyDB
"db_recover" executable. For more information, see the BerkeleyDB
documentation at
http://www.sleepycat.com/docs/ref/transapp/recovery.html.
BerkeleyDB Temp Files
BerkeleyDB creates temporary files in the directory referenced by
the $TMPDIR environment variable. If you do not have a $TMPDIR
set, your temp files will be created somewhere else (see
http://www.sleepycat.com/docs/api_c/env_set_tmp_dir.html for the
tempfile decision algorithm used by BerkeleyDB). These temporary
files are different than BerkeleyDB "log" files, but they can also
become quite large. Make sure you have plenty of temp space
available.
Linux 2GB Limit
BerkeleyDB is effected by the 2GB single-file-size limit on 32-bit
Linux ext2-based systems. The Berkeley storage "pickle" database
(by default named "zodb_pickle"), which holds the bulk of the data
for the Berkeley storages is particularly susceptible to large
growth. If you notice that this file's size (or any other
Berkeley storage-related file) is nearing 2GB, you'll need to move
your BerkeleyDB environment to a filesystem which supports > 2GB
files.
IMPORTANT NOTE: If any of your BerkeleyDB files reaches the 2GB
limit before you notice the failure situation, you will most
likely need to restore the database environment from a backup,
putting the restored files on a filesystem which can handle large
files. This is due to the fact that the database file which "hit
the limit" on a 2GB-limited filesystem will be left in an
inconsistent state, and will probably be rendered unusable. Be
very cautious if you're dealing with large databases.
For More Information
Information about ZODB in general is kept on the ZODB Wiki at
http://www.zope.org/Wikis/ZODB/FrontPage
Information about the Berkeley storages in particular is at
http://www.zope.org/Wikis/ZODB/BerkeleyStorage
The email list zodb-dev@lists.zope.org are where all the
discussion about the Berkeley storages should take place.
Subscribe or view the archives at
http://lists.zope.org/mailman/listinfo/zodb-dev