[Zodb-checkins] CVS: Packages/bsddb3Storage - README:1.1

barry@digicool.com barry@digicool.com
Fri, 27 Apr 2001 18:59:58 -0400 (EDT)


Update of /cvs-repository/Packages/bsddb3Storage
In directory korak:/tmp/cvs-serv29258

Added Files:
	README 
Log Message:
Initial README file for the Berkeley storages.  Some of the
information is taken from PacklessReadme.txt, but genericized for use
with any of the storages.



--- Added File README in package Packages/bsddb3Storage ---
Berkeley (bsddb3) Storage 1.0 beta 2 for ZODB/Zope


Introduction

    This package contains implementations for Zope/ZODB storages based
    on Sleepycat Software's BerkeleyDB 3.x and the PyBSDDB3 extension.
    These storages save ZODB data to some number of BerkeleyDB tables,
    relying on Berkeley's transaction machinery to provide reliability
    and recoverability.


Contents

    Inside the bsddb3Storage package, there are three storage
    implementations:

    - Packless.py is an implementation of an undo-less, version-less
      storage that obviates the need for packing, except in the face
      of cyclic garbage.  It uses a reference counting garbage
      collection strategy to clean up garbage objects in the normal
      case.  This storage was released in 1.0 beta 1.

    - Minimal.py is a new implementation of an undo-less, version-less
      storage, however it currently only does reference counting
      garbage collection on the call to pack().  Thus it is not (yet)
      packless.   However, because Minimal shares code with the Full
      storage, and provides a more robust temporary commit log, this
      is the wave of the future.  For 1.0 final release, Minimal will
      be augmented to provide packless reclamation of reference
      counted garbage, and will be integrated with automatic cyclic
      garbage detection.

    - Full.py is a complete storage implementation, supporting undo,
      versions, automatic reference counting garbage collection.
      Packing this storage is only required in order to get rid of old
      object revisions.  For the 1.0 final release, Full will be
      integrated with automatic cyclic garbage detection.

      Note that Full.py supports a new style of undo, called
      "transactional undo", which is really a non-destructive,
      redo-able undo.  For more information on transactional undo, see

      http://www.zope.org/Wikis/ZODB/TransactionalUndo

      Old style undo is /not/ supported in the Full.py storage, so
      unless you have a newer version of ZODB, you will not be able to
      perform undos.  It is unlikely that old style undo support will
      be added to Full.py storage.


Compatibility

    The Full and Minimal storages have been tested with Python 2.0 and
    2.1, but not with Python 1.5.2.  It's possible they work with
    Python 1.5.2, but it's also not likely that Python 1.5.2 will be
    explicitly supported.  Full storage has been tested with Zope
    2.3.1 (running Python 2.1) and should work, except for undo as
    described above.  Transactional undo support will work with Zope
    2.4.  These storages have only been tested on Linux, and should
    work for on Unix-like operating system.  Windows has been
    partially tested, so it should work.


Requirements

    You must install Sleepcat BerkeleyDB 3.x and Robin Dunn's PyBSDDB
    package separately.

    To obtain the latest source release of BerkeleyDB 3.x, see
    http://www.sleepycat.com.  As of this writing the latest
    BerkeleyDB release is 3.2.9, and this is the version these
    storages have been tested with.  Before using BerkeleyDB, be sure
    that you comply with its licensing requirements:

    http://www.sleepycat.com/licensing.html

    To obtain the latest source release of Robin Dunn's PyBSDDB
    package, see http://pybsddb.sourceforge.net

    Install both BerkeleyDB and PyBSDDB as per the instructions which
    they come with.  For BerkeleyDB, it's generally wise to accept the
    default "configure" options and do a "make install" as root.
    PyBSDDB comes with a distutils-based setup script which should
    allow you to place the package in a globally accessible directory
    which is in your PYTHONPATH (e.g. "site-packages/bsddb3").

    When you can run the tests which ship with PyBSDDB, you'll know
    you've been successful at both BerkeleyDB and PyBSDDB
    installation.


Installing bsddb3Storage

    The bsddb3Storage is distributed as a Python distutils package, so
    the simplest thing to do is to use distutils to install it:

    % python setup.py install

    Then you should be able to do this at the prompt:

    Python 2.1 (#1, Apr 17 2001, 23:30:09) 
    [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
    Type "copyright", "credits" or "license" for more information.
    >>> import bsddb3Storage
    >>> bsddb3Storage.__version__
    '1.0 beta 2'

    See also docs/PacklessReadme.txt for alternative installation
    directions.


Using bsddb3Storage

    By default, Zope uses a FileStorage to hold ZODB data.  To tell
    Zope to use an alternate storage such as Packless, you need to set
    up a custom_zodb.py file.

    There is a sample custom_zodb.py file in the docs/ subdirectory,
    shipped with this release.  The easiest way to started get with
    one of the Berkeley storages is to copy custom_zodb.py file to
    your SOFTWARE_HOME directory (your main Zope dir) and edit its
    contents to specify which storage you want to use.  If you use an
    INSTANCE_HOME setup, you'll want to copy the file to the
    INSTANCE_HOME directory instead and do the same.

    If you choose to edit the contents of the custom_zodb.py file, you
    can change the "env" string to point to a different "environment"
    directory for BerkeleyDB.  BerkeleyDB needs its own working
    directory (which it calls an environment) into which it will store
    its support tables and log files.  The contents of this directory
    can become quite large, even if your data needs are relatively
    modest (see "BerkeleyDB Log Files" below).

    By default, the environment path is set in custom_zodb.py to a
    subdirectory of your Zope's var subdirectory named bsddb3Storage.
    You may change this to any absolute or relative path to which the
    user which runs the Zope executable has write privileges.  If the
    environment directory doesn't exist, it will be created when you
    first run Zope with one of the storages.  It is recommended that
    you choose an environment directory which does not contain any
    other files.  Additionally, you should not use BerkeleyDB on
    remotely mounted filesystems such as NFS.


Use with ZEO

    The Berkeley storages are compatible with ZEO.  For general
    information on how to use alternate storage implementations with
    ZEO, see the "start.txt" file in the ZEO release documentation.


BerkeleyDB Files

    After Zope is started with one of the Berkeley storages, you will
    see a number of different types of files in your BerkeleyDB
    environment directory.  There will be a number of "__db*" files, a
    number of "log.*" files, and several files which have the prefix
    "zodb_".  The files which have the zodb_ prefix are the actual
    BerkeleyDB databases which hold the storage data.  The "log.*"
    files are write-ahead logs for BerkeleyDB transactions, and they
    are very important.  The "__db*" files are working files for
    BerkeleyDB, and they are less important.  It's wise to back up all
    the files in this directory regularly.  BerkeleyDB supports
    "hot-backup".  Log files need to be archived and cleared on a
    regular basis (a following section covers this).

    You may also occasionally see some files with names that are long
    strings of hexadecimal digits.  These are "commit log" temporary
    files, created by the Full or Minimal storages, which are used to
    buffer database modifications during the two-phase commit process
    (don't confuse these with BerkeleyDB log files).

    Because of the semantics of BerkeleyDB's transactions, it is
    necessary to store the changes in this temporary file until a
    BerkeleyDB transaction can be committed.  Under normal operation,
    the long hex-digit files should only exist during a ZODB
    transaction.  However, if some fatal error occurs during a ZODB
    transaction which is neither committed nor aborted, the
    uncommitted changes will reside in this file.  Your storage will
    then refuse to allow new changes to the database until a recovery
    process is run (this is not the BerkeleyDB recovery process, and
    unfortunately the recover script has not yet been written; it will
    for the 1.0 final release).


BerkeleyDB Log Files

    BerkeleyDB is a transactional database system.  In order to
    maintain transactional integrity, BerkeleyDB writes data to "log
    files" before the data is committed.  These log files live in the
    BerkeleyDB "environment" directory unless you take steps to
    configure your BerkeleyDB environment differently.  BerkeleyDB log
    files can become quite large, as well, so it may be necessary to
    place them on a separate partition with lots of free disk space.
    The log file directory can be changed by creating a file named
    'DB_CONFIG' in the BerkeleyStorage "environment" directory you've
    chosen within 'custom_zodb.py', customizing the following
    content:

	set_lg_dir /the/path/to/the/log/file/directory

    After using this configuration file to redirect logfile placement,
    your actual database files will still be kept in the directory
    specified by the "env" setting of your custom_zodb.py; only the
    BerkeleyDB log files will be written to the directory you specify
    in 'DB_CONFIG'.

    Redirecting log files to a directory other than your environment
    directory may improve recoverability in the case of BerkeleyDB
    failure.  It may also improve performance if the log file
    directory is on a separate hard disk/controller combination.  For
    more information about BerkeleyDB log files, recoverability and
    why it is advantageous to put your log files and your database
    files on separate devices, see

    http://www.sleepycat.com/docs/ref/transapp/reclimit.html.


Setting BerkeleyDB Maximum Locks

    ZODB transactions can be of almost arbitrary sizes (actually, they
    "top out" at a total size of 2GB).  BerkeleyDB is configured to
    use 500 locks by default.  Larger transactions in BerkeleyDB
    require more locks.  Thus, Packless ships with the default number
    of Berkeley locks set to 10,000 (BAW: is this still the case, and
    what about Full and Minimal?).  This should allow almost any Zope
    transaction to commit at the expense of increased RAM consumption.
    Utilizing 10,000 locks requires (at least on Linux systems)
    approximately 3MB of RAM overhead, perhaps little of which is
    being actually used in environments which do not commit large
    transactions.  You can reduce RAM consumption by manually sizing
    BerkeleyDB locking.

    To manually size locking, create (or edit) the file DB_CONFIG
    within the BerkeleyDB "environment" directory you've chosen in
    custom_zodb.py, adding the following directives to the DB_CONFIG
    file:

        set_lk_max_locks 500
        set_lk_max_objects 500
        set_lk_max_lockers 3

    Change the integers as necessary.  When one of the Berkeley
    storages starts up, the Berkeley directives supplied in DB_CONFIG
    will override the defaults.

    Precision-sizing BerkeleyDB locking is a site-dependent task.
    Sleepycat recommends that you run the "db_stat -c" command against
    the database environment to see what the "max number of locks,
    lock objects and lockers so far" numbers are during highly
    stressful operations, multiply each of those numbers by 2, and
    provide the multiplied-by-2 numbers as arguments to
    set_lk_max_locks, set_lk_max_objects, and set_lk_max_lockers
    respectively in DB_CONFIG.  For detailed BerkeleyDB locking sizing
    strategy, see http://www.sleepycat.com/docs/ref/lock/max.html.


Archival and Maintenance

    Log file rotation for Berkeley DB is closely related to database
    archival.

    BerkeleyDB never deletes "old" log files.  Eventually, if you do
    not maintain your Berkeley database by deleting "old" log files,
    you will run out of disk space.  It's necessary to maintain and
    archive your BerkeleyDB files as per the procedures outlined in
    http://pybsddb.sourceforge.net/ref/transapp/archival.html.

    It is advantageous to automate this process, perhaps by creating a
    script run by "cron" that makes use of the "db_archive" executable
    as per the referenced document.  One strategy might be to perform
    the following sequence of operations:

    - shut down the process which is using BerkeleyDB (Zope or the ZEO
      storage server).

    - back up the database files (the files prefixed with "zodb").

    - back up all existing BerkeleyDB log files (the files prefixed
      "log").

    - run "db_archive -h /the/environment/directory" against your
      environment directory to find out which log files are no longer
      participating in transactions (they will be printed to stdout
      one file per line).

    - delete the log files that were reported by "db_archive" as no
      longer participating in any transactions.

    "Hot" backup and rotation of log files is slightly different.  See
    the above-referenced link regarding archival for more information.


Disaster Recovery

    To recover from an out-of-disk-space error on the log file
    partition, or another recoverable failure which causes the storage
    to raise a fatal exception, you may need to use the BerkeleyDB
    "db_recover" executable.  For more information, see the BerkeleyDB
    documentation at
    http://www.sleepycat.com/docs/ref/transapp/recovery.html.


BerkeleyDB Temp Files

    BerkeleyDB creates temporary files in the directory referenced by
    the $TMPDIR environment variable.  If you do not have a $TMPDIR
    set, your temp files will be created somewhere else (see
    http://www.sleepycat.com/docs/api_c/env_set_tmp_dir.html for the
    tempfile decision algorithm used by BerkeleyDB).  These temporary
    files are different than BerkeleyDB "log" files, but they can also
    become quite large.  Make sure you have plenty of temp space
    available.


Linux 2GB Limit

    BerkeleyDB is effected by the 2GB single-file-size limit on 32-bit
    Linux ext2-based systems.  The Berkeley storage "pickle" database
    (by default named "zodb_pickle"), which holds the bulk of the data
    for the Berkeley storages is particularly susceptible to large
    growth.  If you notice that this file's size (or any other
    Berkeley storage-related file) is nearing 2GB, you'll need to move
    your BerkeleyDB environment to a filesystem which supports > 2GB
    files.

    IMPORTANT NOTE: If any of your BerkeleyDB files reaches the 2GB
    limit before you notice the failure situation, you will most
    likely need to restore the database environment from a backup,
    putting the restored files on a filesystem which can handle large
    files.  This is due to the fact that the database file which "hit
    the limit" on a 2GB-limited filesystem will be left in an
    inconsistent state, and will probably be rendered unusable.  Be
    very cautious if you're dealing with large databases.


For More Information

    Information about ZODB in general is kept on the ZODB Wiki at

	http://www.zope.org/Wikis/ZODB/FrontPage

    Information about the Berkeley storages in particular is at

	http://www.zope.org/Wikis/ZODB/BerkeleyStorage

    The email list zodb-dev@lists.zope.org are where all the
    discussion about the Berkeley storages should take place.
    Subscribe or view the archives at

	http://lists.zope.org/mailman/listinfo/zodb-dev