[Zodb-checkins] SVN: ZODB/trunk/doc/ Documentation cleanup.

Jim Fulton jim at zope.com
Sat Jan 19 13:53:09 EST 2008


Log message for revision 82954:
  Documentation cleanup.
  

Changed:
  D   ZODB/trunk/doc/Makefile
  D   ZODB/trunk/doc/ZEO/
  D   ZODB/trunk/doc/zdctl.txt
  A   ZODB/trunk/doc/zeo-client-cache-tracing.txt
  A   ZODB/trunk/doc/zeo-client-cache.txt
  A   ZODB/trunk/doc/zeo.txt

-=-
Deleted: ZODB/trunk/doc/Makefile
===================================================================
--- ZODB/trunk/doc/Makefile	2008-01-19 18:53:02 UTC (rev 82953)
+++ ZODB/trunk/doc/Makefile	2008-01-19 18:53:09 UTC (rev 82954)
@@ -1,36 +0,0 @@
-MKHOWTO=mkhowto
-
-MKHTML=$(MKHOWTO) --html --iconserver=. --split=4 --dvips-safe
-
-ZODBTEX = guide/gfdl.tex guide/introduction.tex guide/modules.tex \
-	  guide/prog-zodb.tex guide/storages.tex guide/transactions.tex \
-	  guide/zeo.tex guide/zodb.tex 
-
-default: pdf
-all:	 pdf ps html
-
-pdf:	storage.pdf zodb.pdf
-ps:	storage.ps zodb.ps
-
-html:	storage/storage.html zodb/zodb.html
-
-storage.pdf: storage.tex
-	$(MKHOWTO) --pdf $<
-
-storage.ps: storage.tex
-	$(MKHOWTO) --ps $<
-
-storage/storage.html: storage.tex
-	$(MKHTML) storage.tex
-
-zodb.pdf: $(ZODBTEX)
-	$(MKHOWTO) --pdf guide/zodb.tex
-
-zodb.ps: $(ZODBTEX)
-	$(MKHOWTO) --ps guide/zodb.tex
-
-zodb/zodb.html: $(ZODBTEX)
-	$(MKHTML) guide/zodb.tex
-
-clobber:
-	rm -rf storage.pdf storage.ps storage/ zodb.pdf zodb.ps zodb/

Deleted: ZODB/trunk/doc/zdctl.txt
===================================================================
--- ZODB/trunk/doc/zdctl.txt	2008-01-19 18:53:02 UTC (rev 82953)
+++ ZODB/trunk/doc/zdctl.txt	2008-01-19 18:53:09 UTC (rev 82954)
@@ -1,335 +0,0 @@
-Using zdctl and zdrun to manage server processes
-================================================
-
-
-Summary
--------
-
-Starting with Zope 2.7 and ZODB 3.2, Zope has a new way to configure
-and control server processes.  This file documents the new approach to
-server process management; the new approach to configuration is
-documented elsewhere, although some examples will be given here.  We
-use the ZEO server as a running example, although this isn't a
-complete manual for configuring or running ZEO.
-
-This documentation applies to Unix/Linux systems; zdctl and zdrun do
-not work on Windows.
-
-
-Prerequisites
--------------
-
-This document assumes that you have installed the ZODB3 software
-(version 3.2 or higher) using a variation on the following command,
-given from the root directory of the ZODB3 distribution::
-
-  $ python setup.py install
-
-This installs the packages ZConfig, ZEO, zdaemon, zLOG, ZODB and
-various other needed packages and extension modules in the Python
-interpreter's site-packages directory, and installs scripts including
-zdctl.py, zdrun.py, runzeo.py and mkzeoinst.py in /usr/local/bin
-(actually the bin directory from which the python interpreter was
-loaded).
-
-When you receive ZODB as a part of Zope (version 2.7 or higher), the
-installation instructions will explain how to reach a similar state.
-
-
-Introduction
-------------
-
-The most basic way to run a ZEO server is using the following
-command::
-
-  $ runzeo.py -a 9999 -f Data.fs
-
-Here 9999 is the ZEO port (you can pick your own unused TCP port
-number in the range 1024 through 65535, inclusive); Data.fs is the
-storage file.  Again, you can pick any filename you want; the
-ZODB.FileStorage module code creates this file and various other files
-with additional extensions, like Data.fs.index, Data.fs.lock, and
-Data.fs.tmp.
-
-If something's wrong, for example if you picked a bad port number or
-filename, you'll get an error message or an exception right away and
-runzeo.py will exit with a non-zero exit status.  The exit status is 2
-for command line syntax errors, 1 for other errors.
-
-If all's well, runzeo.py will emit a few logging messages to stderr
-and start serving, until you hit ^C.  For example::
-
-  $ runzeo.py -a 9999 -f Data.fs
-  ------
-  2003-01-24T11:49:27 INFO(0) RUNSVR opening storage '1' using FileStorage
-  ------
-  2003-01-24T11:49:27 INFO(0) ZSS:23531 StorageServer created RW with
-  storages: 1:RW:Data.fs
-  ------
-  2003-01-24T11:49:27 INFO(0) zrpc:23531 listening on ('', 9999)
-
-At this point you can hit ^C to stop it; runzeo.py will catch the
-interrupt signal, emit a few more log messages and exit::
-
-  ^C
-  ------
-  2003-01-24T12:11:15 INFO(0) RUNSVR terminated by SIGINT
-  ------
-  2003-01-24T12:11:15 INFO(0) RUNSVR closing storage '1'
-  $ 
-
-This may be fine for testing, but a bad idea for running a ZEO server
-in a production environment.  In production, you want the ZEO server
-to be run as a daemon process, you want the log output to go to a
-file, you want the ZEO server to be started when the system is
-rebooted, and (usually) you want the ZEO server to be automatically
-restarted when it crashes.  You should also have a log rotation policy
-in place so that your disk doesn't fill up with log messages.
-
-The zdctl/zdrun combo can take care of running a server as a daemon
-process and restarting it when it crashes.  It can also be used to
-start it when the system is rebooted.  Sending log output to a file is
-done by adjusting the ZEO server configuration.  There are many fine
-existing tools to rotate log files, so we don't provide this
-functionality; zdctl has a command to send the server process a
-SIGUSR2 signal to tell it to reopen its log file after log rotation
-has taken place (the ZEO server has a signal handler that catches
-SIGUSR2 for this purpose).
-
-In addition, zdctl lets a system administrator or developer control
-the server process.  This is useful to deal with typical problems like
-restarting a hanging server or adjusting a server's configuration.
-
-The zdctl program can be used in two ways: in one-shot mode it
-executes a single command (such as "start", "stop" or "restart"); in
-interactive mode it acts much like a typical Unix shell or the Python
-interpreter, printing a prompt to standard output and reading commands
-from standard input.  It currently cannot be used to read commands
-from a file; if you need to script it, you can use a shell script
-containing repeated one-shot invocations.
-
-zdctl can be configured using command line options or a configuration
-file.  In practice, you'll want to use a configuration file; but first
-we'll show some examples using command line options only.  Here's a
-one-shot zdctl command to start the ZEO server::
-
-  $ zdctl.py -p "runzeo.py -a 9999 -f Data.fs" start
-
-The -p option specifies the server program; it is the runzeo
-invocation that we showed before.  The start argument tells it to
-start the process.  What actually happens is that zdctl starts zdrun,
-and zdrun now manages the ZEO server process.  The zdctl process exits
-once zdrun has started the ZEO server process; the zdrun process stays
-around, and when the ZEO server process crashes it will restart it.
-
-To check that the ZEO server is now running, use the zdctl status
-command::
-
-  $ zdctl.py -p "runzeo.py -a 9999 -f Data.fs" status
-
-This prints a one-line message telling you that the program is
-running.  To stop the ZEO server, use the zdctl stop command::
-
-  $ zdctl.py -p "runzeo.py -a 9999 -f Data.fs" stop
-
-To check that is no longer running, use the zdctl status command
-again.
-
-
-Daemon mode
------------
-
-If you are playing along on your computer, you cannot have missed that
-some log output has been spewing to your terminal window.  While this
-may give you a warm and fuzzy feeling that something is actually
-happening, after a whiile it can get quite annoying (especially if
-clients are actually connecting to the server).  This can be avoided
-by using the -d flag, which enables "daemon mode"::
-
-  $ zdctl.py -d -p "runzeo.py -a 9999 -f Data.fs" start
-
-Daemon mode does several subtle things; see for example section 13.3
-of "Advanced Programming in the UNIX Environment" by Richard Stevens
-for a good explanation of daemon mode.  For now, the most important
-effect is that the standard input, output and error streams are
-redirected to /dev/null, and that the process is "detached" from your
-controlling tty, which implies that it won't receive a SIGHUP signal
-when you log out.
-
-
-Using a configuration file
---------------------------
-
-I hope you are using a Unix shell with command line history, otherwise
-entering the examples above would have been quite a pain.  But a
-better way to control zdctl and zdrun's many options without having to
-type them over and over again is to use a configuration file.  Here's
-a small configuration file; place this in the file "zeoctl.conf" (the
-name is just a convention; you can call it "foo" if you prefer)::
-
-  # Sample zdctl/zdrun configuration
-  <runner>
-    program       runzeo.py -a 9999 -f Data.fs
-    daemon	  true
-    directory     /tmp/zeohome
-    socket-name   /tmp/zeohome/zdsock
-  </runner>
-
-The "program" and "daemon" lines correspond to the -p and -d command
-line options discussed above.  The "directory" line is new.  It
-specifies a directory into which zdrun (but not zdctl!) chdirs.  This
-directory should exist; zdctl won't create it for you.  The Data.fs
-filename passed to runzeo.py is interpreted relative to this
-directory.  Finally, the "socket-name" line names the Unix domain
-socket that is used for communication between zdctl and zdrun.  It
-defaults to zdsock in the current directory, a default you definitely
-want to override for production usage.
-
-To invoke zdctl with a configuration file, use its -C option to name
-the configuration file, for example::
-
-  $ zdctl.py -C zeoctl.conf start
-
-  $ zdctl.py -C zeoctl.conf status
-
-  $ zdctl.py -C zeoctl.conf stop
-
-
-Interactive mode
-----------------
-
-Using a configuration file makes it a little easier to repeatedly
-start, stop and request status of a particular server, but it still
-requires typing the configuration file name on each command.
-Fortunately, zdctl.py can be used as an interactive "shell" which lets
-you execute repeated commands for the same server.  Simply invoke
-zdctl.py without the final argument ("start", "status" or "stop" in
-the above examples)::
-
-  $ zdctl.py -C zeoctl.conf
-  program: runzeo.py -a 9999 -f Data.fs
-  daemon manager not running
-  zdctl> 
-
-The first two lines of output are status messages (and could be
-different in your case); the final line is the interactive command
-prompt.  At this prompt, you can type commands::
-
-  zdctl> help
-
-  Documented commands (type help <topic>):
-  ========================================
-  EOF             fg              foreground      help            kill
-  logreopen       logtail         quit            reload          restart
-  shell           show            start           status          stop
-  wait            
-
-  zdctl> help start
-  start -- Start the daemon process.
-	   If it is already running, do nothing.
-  zdctl> start
-  daemon process started, pid=31580
-  zdctl> status
-  program running; pid=31580
-  zdctl> stop
-  daemon process stopped
-  zdctl> quit
-  daemon manager not running
-  $ 
-
-In short, the commands you can type at the interactive prompt are the
-same commands (with optional arguments) that you can use as positional
-arguments on the zdctl.py command line.
-
-The interactive shell has some additional features:
-
-- Line editing and command line history using the standard GNU
-  readline module.
-
-- A blank line repeats the last command (especially useful for status).
-
-- Command and argument completion using the TAB key.
-
-One final note: some people don't like it that an invocation without
-arguments enters interactive mode.  If this describes you, there's an
-easy way to disable this feature: add a line saying
-
-  default-to-interactive false
-
-to the zeoctl.conf file.  You can still enter interactive mode by
-using the -i option.
-
-
-Using mkzeoinst.py
-------------------
-
-If you still think that all of the above is a lot of typing, you're
-right.  Fortunately, there's a simple utility that help you creating
-and configuring a ZEO server instance.  mkzeoinst.py requires one
-argument, the ZEO server's "home directory".  After that, you can
-optionally specify a service port number; the port defaults to 9999.
-
-mkzeoinst.py creates the server home directory (and its ancestor
-directories if necessary), and then creates the following directory
-substructure:
-
-  bin/ - directory for scripts (zeoctl)
-  etc/ - directory for configuration files (zeo.conf, zeoctl.conf)
-  log/ - directory for log files (zeo.log, zeoctl.log)
-  var/ - directory for data files (Data.fs and friends)
-
-If the server home directory or any of its subdirectories already
-exist, mkzeoinst.py will note this and assume you are rebuilding an
-existing instance.  (In fact, it prints a message for each directory
-it creates but is silent about existing directories.)
-
-It then creates the following files:
-
-  bin/zeoctl      - executable shell script to run zdctl.py
-  etc/zeo.conf    - configuration file for ZEO
-  etc/zeoctl.conf - configuration file for zdrun.py and zdctl.py
-
-If any of the files it wants to create already exists and is
-non-empty, it does not write the file.  (An empty file will be
-overwritten though.)  If the existing contents differ from what it
-would have written if the file didn't exist, it prints a warning
-message; otherwise the skipping is silent.
-
-Other errors (e.g. permission errors creating or reading files or
-directories) cause mkzeoinst.py to bail with an error message; it does
-not clean up the work already done.
-
-The created files contain absolute path references to all of the
-programs, files, directories used.  They also contain default values
-for most configuration settings that one might normally want to
-configure.  Most configured settings are the same as the defaults;
-however, daemon mode is on while the regular default is off.  Log
-files are configured to go into the log directory.  If configures
-separate log files for zdrun.py/zdctl.py (log/zeoctl.log) and for the
-ZEO server itself (log/zeo.log).  Once created, the files are yours;
-feel free to edit them to suit your taste.
-
-The bin/zeoctl script should be invoked with the positional arguments
-(e,g, "start", "stop" or "status") that you would pass to zdctl.py;
-the script hardcodes the configuration file so you don't have to pass
-that.  It can also be invoked without arguments to enter interactive
-mode.
-
-One final detail: if you want the ZEO server to be started
-automatically when the machine is rebooted, and you're lucky enough to
-be using a recent Red Hat (or similar) system, you can copy the
-bin/zeoctl script into the /etc/rc.d/init.d/ directory and use
-chkconfig(8) to create the correct symlinks to it; the bin/zeoctl
-script already has the appropriate magical comments for chkconfig.
-
-
-zdctl reference
----------------
-
-TBD
-
-
-zdrun reference
----------------
-
-TBD

Copied: ZODB/trunk/doc/zeo-client-cache-tracing.txt (from rev 82950, ZODB/trunk/doc/ZEO/trace.txt)
===================================================================
--- ZODB/trunk/doc/zeo-client-cache-tracing.txt	                        (rev 0)
+++ ZODB/trunk/doc/zeo-client-cache-tracing.txt	2008-01-19 18:53:09 UTC (rev 82954)
@@ -0,0 +1,144 @@
+ZEO Client Cache Tracing
+========================
+
+An important question for ZEO users is: how large should the ZEO
+client cache be?  ZEO 2 (as of ZEO 2.0b2) has a new feature that lets
+you collect a trace of cache activity and tools to analyze this trace,
+enabling you to make an informed decision about the cache size.
+
+Don't confuse the ZEO client cache with the Zope object cache.  The
+ZEO client cache is only used when an object is not in the Zope object
+cache; the ZEO client cache avoids roundtrips to the ZEO server.
+
+Enabling Cache Tracing
+----------------------
+
+To enable cache tracing, you must use a persistent cache (specify a ``client``
+name), and set the environment variable ZEO_CACHE_TRACE to a non-empty
+value.  The path to the trace file is derived from the path to the persistent
+cache file by appending ".trace".  If the file doesn't exist, ZEO will try to
+create it.  If the file does exist, it's opened for appending (previous trace
+information is not overwritten).  If there are problems with the file, a
+warning message is logged.  To start or stop tracing, the ZEO client process
+(typically a Zope application server) must be restarted.
+
+The trace file can grow pretty quickly; on a moderately loaded server, we
+observed it growing by 7 MB per hour.  The file consists of binary records,
+each 34 bytes long if 8-byte oids are in use; a detailed description of the
+record lay-out is given in stats.py.  No sensitive data is logged:  data
+record sizes (but not data records), and binary object and transaction ids
+are logged, but no object pickles, object types or names, user names,
+transaction comments, access paths, or machine information (such as machine
+name or IP address) are logged.
+
+Analyzing a Cache Trace
+-----------------------
+
+The stats.py command-line tool is the first-line tool to analyze a cache
+trace.  Its default output consists of two parts:  a one-line summary of
+essential statistics for each segment of 15 minutes, interspersed with lines
+indicating client restarts, followed by a more detailed summary of overall
+statistics.
+
+The most important statistic is the "hit rate", a percentage indicating how
+many requests to load an object could be satisfied from the cache.  Hit rates
+around 70% are good.  90% is excellent.  If you see a hit rate under 60% you
+can probably improve the cache performance (and hence your Zope application
+server's performance) by increasing the ZEO cache size.  This is normally
+configured using key ``cache_size`` in the ``zeoclient`` section of your
+configuration file.  The default cache size is 20 MB, which is small.
+
+The stats.py tool shows its command line syntax when invoked without
+arguments.  The tracefile argument can be a gzipped file if it has a .gz
+extension.  It will be read from stdin (assuming uncompressed data) if the
+tracefile argument is '-'.
+
+Simulating Different Cache Sizes
+--------------------------------
+
+Based on a cache trace file, you can make a prediction of how well the cache
+might do with a different cache size.  The simul.py tool runs a simulation of
+the ZEO client cache implementation based upon the events read from a trace
+file.  A new simulation is started each time the trace file records a client
+restart event; if a trace file contains more than one restart event, a
+separate line is printed for each simulation, and a line with overall
+statistics is added at the end.
+
+Example, assuming the trace file is in /tmp/cachetrace.log::
+
+    $ python simul.py -s 4 /tmp/cachetrace.log
+    CircularCacheSimulation, cache size 4,194,304 bytes
+      START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE  EVICTS   INUSE
+    Jul 22 22:22     39:09  3218856  1429329  24046  41517   44.4%   40776    99.8
+
+This shows that with a 4 MB cache size, the cache hit rate is 44.4%, the
+percentage 1429329 (number of cache hits) is of 3218856 (number of load
+requests).  The cache simulated 40776 evictions, to make room for new object
+states.  At the end, 99.8% of the bytes reserved for the cache file were in
+use to hold object state (the remaining 0.2% consists of "holes", bytes freed
+by object eviction and not yet reused to hold another object's state).
+
+Let's try this again with an 8 MB cache::
+
+    $ python simul.py -s 8 /tmp/cachetrace.log
+    CircularCacheSimulation, cache size 8,388,608 bytes
+      START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE  EVICTS   INUSE
+    Jul 22 22:22     39:09  3218856  2182722  31315  41517   67.8%   40016   100.0
+
+That's a huge improvement in hit rate, which isn't surprising since these are
+very small cache sizes.  The default cache size is 20 MB, which is still on
+the small side::
+
+    $ python simul.py /tmp/cachetrace.log
+    CircularCacheSimulation, cache size 20,971,520 bytes
+      START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE  EVICTS   INUSE
+    Jul 22 22:22     39:09  3218856  2982589  37922  41517   92.7%   37761    99.9
+
+Again a very nice improvement in hit rate, and there's not a lot of room left
+for improvement.  Let's try 100 MB::
+
+    $ python simul.py -s 100 /tmp/cachetrace.log
+    CircularCacheSimulation, cache size 104,857,600 bytes
+      START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE  EVICTS   INUSE
+    Jul 22 22:22     39:09  3218856  3218741  39572  41517  100.0%   22778   100.0
+
+It's very unusual to see a hit rate so high.  The application here frequently
+modified a very large BTree, so given enough cache space to hold the entire
+BTree it rarely needed to ask the ZEO server for data:  this application
+reused the same objects over and over.
+
+More typical is that a substantial number of objects will be referenced only
+once.  Whenever an object turns out to be loaded only once, it's a pure loss
+for the cache:  the first (and only) load is a cache miss; storing the object
+evicts other objects, possibly causing more cache misses; and the object is
+never loaded again.  If, for example, a third of the objects are loaded only
+once, it's quite possible for the theoretical maximum hit rate to be 67%, no
+matter how large the cache.
+
+The simul.py script also contains code to simulate different cache
+strategies.  Since none of these are implemented, and only the default cache
+strategy's code has been updated to be aware of MVCC, these are not further
+documented here.
+
+Simulation Limitations
+----------------------
+
+The cache simulation is an approximation, and actual hit rate may be higher
+or lower than the simulated result.  These are some factors that inhibit
+exact simulation:
+
+- The simulator doesn't try to emulate versions.  If the trace file contains
+  loads and stores of objects in versions, the simulator treats them as if
+  they were loads and stores of non-version data.
+
+- Each time a load of an object O in the trace file was a cache hit, but the
+  simulated cache has evicted O, the simulated cache has no way to repair its
+  knowledge about O.  This is more frequent when simulating caches smaller
+  than the cache used to produce the trace file.  When a real cache suffers a
+  cache miss, it asks the ZEO server for the needed information about O, and
+  saves O in the client cache.  The simulated cache doesn't have a ZEO server
+  to ask, and O continues to be absent in the simulated cache.  Further
+  requests for O will continue to be simulated cache misses, although in a
+  real cache they'll likely be cache hits.  On the other hand, the
+  simulated cache doesn't need to evict any objects to make room for O, so it
+  may enjoy further cache hits on objects a real cache would have evicted.

Copied: ZODB/trunk/doc/zeo-client-cache.txt (from rev 82950, ZODB/trunk/doc/ZEO/cache.txt)
===================================================================
--- ZODB/trunk/doc/zeo-client-cache.txt	                        (rev 0)
+++ ZODB/trunk/doc/zeo-client-cache.txt	2008-01-19 18:53:09 UTC (rev 82954)
@@ -0,0 +1,48 @@
+ZEO Client Cache
+
+  The client cache provides a disk based cache for each ZEO client.  The
+  client cache allows reads to be done from local disk rather than by remote
+  access to the storage server.
+
+  The cache may be persistent or transient.  If the cache is persistent, then
+  the cache file is retained for use after process restarts.  A non-
+  persistent cache uses a temporary file.
+
+  The client cache is managed in a single file, of the specified size.
+
+  The life of the cache is as follows:
+
+ -  The cache file is opened (if it already exists), or created and set to
+    the specified size.
+
+  - Cache records are written to the cache file, as transactions commit
+    locally, and as data are loaded from the server.
+
+  - Writes are to "the current file position".  This is a pointer that
+    travels around the file, circularly.  After a record is written, the
+    pointer advances to just beyond it.  Objects starting at the current
+    file position are evicted, as needed, to make room for the next record
+    written.
+
+  A distinct index file is not created, although indexing structures are
+  maintained in memory while a ClientStorage is running.  When a persistent
+  client cache file is reopened, these indexing structures are recreated
+  by analyzing the file contents.
+
+  Persistent cache files are created in the directory named in the ``var``
+  argument to the ClientStorage, or if ``var`` is None, in the current
+  working directory.  Persistent cache files have names of the form::
+
+    client-storage.zec
+
+  where:
+
+    client -- the client name, as given by the ClientStorage's ``client``
+              argument
+
+    storage -- the storage name, as given by the ClientStorage's ``storage``
+               argument; this is typically a string denoting a small integer,
+               "1" by default
+
+  For example, the cache file for client '8881' and storage 'spam' is named
+  "8881-spam.zec".

Copied: ZODB/trunk/doc/zeo.txt (from rev 82950, ZODB/trunk/doc/ZEO/howto.txt)
===================================================================
--- ZODB/trunk/doc/zeo.txt	                        (rev 0)
+++ ZODB/trunk/doc/zeo.txt	2008-01-19 18:53:09 UTC (rev 82954)
@@ -0,0 +1,415 @@
+==========================
+Running a ZEO Server HOWTO
+==========================
+
+Introduction
+------------
+
+ZEO (Zope Enterprise Objects) is a client-server system for sharing a
+single storage among many clients.  Normally, a ZODB storage can only
+be used by a single process.  When you use ZEO, the storage is opened
+in the ZEO server process.  Client programs connect to this process
+using a ZEO ClientStorage.  ZEO provides a consistent view of the
+database to all clients.  The ZEO client and server communicate using
+a custom RPC protocol layered on top of TCP.
+
+There are several configuration options that affect the behavior of a
+ZEO server.  This section describes how a few of these features
+working.  Subsequent sections describe how to configure every option.
+
+Client cache
+~~~~~~~~~~~~
+
+Each ZEO client keeps an on-disk cache of recently used objects to
+avoid fetching those objects from the server each time they are
+requested.  It is usually faster to read the objects from disk than it
+is to fetch them over the network.  The cache can also provide
+read-only copies of objects during server outages.
+
+The cache may be persistent or transient. If the cache is persistent,
+then the cache files are retained for use after process restarts. A
+non-persistent cache uses temporary files that are removed when the
+client storage is closed.
+
+The client cache size is configured when the ClientStorage is created.
+The default size is 20MB, but the right size depends entirely on the
+particular database.  Setting the cache size too small can hurt
+performance, but in most cases making it too big just wastes disk
+space.  The document "Client cache tracing" describes how to collect a
+cache trace that can be used to determine a good cache size.
+
+ZEO uses invalidations for cache consistency.  Every time an object is
+modified, the server sends a message to each client informing it of
+the change.  The client will discard the object from its cache when it
+receives an invalidation.  These invalidations are often batched.
+
+Each time a client connects to a server, it must verify that its cache
+contents are still valid.  (It did not receive any invalidation
+messages while it was disconnected.)  There are several mechanisms
+used to perform cache verification.  In the worst case, the client
+sends the server a list of all objects in its cache along with their
+timestamps; the server sends back an invalidation message for each
+stale object.  The cost of verification is one drawback to making the
+cache too large.
+
+Note that every time a client crashes or disconnects, it must verify
+its cache.  Every time a server crashes, all of its clients must
+verify their caches.
+
+The cache verification process is optimized in two ways to eliminate
+costs when restarting clients and servers.  Each client keeps the
+timestamp of the last invalidation message it has seen.  When it
+connects to the server, it checks to see if any invalidation messages
+were sent after that timestamp.  If not, then the cache is up-to-date
+and no further verification occurs.  The other optimization is the
+invalidation queue, described below.
+
+Invalidation queue
+~~~~~~~~~~~~~~~~~~
+
+The ZEO server keeps a queue of recent invalidation messages in
+memory.  When a client connects to the server, it sends the timestamp
+of the most recent invalidation message it has received.  If that
+message is still in the invalidation queue, then the server sends the
+client all the missing invalidations.  This is often cheaper than
+perform full cache verification.
+
+The default size of the invalidation queue is 100.  If the
+invalidation queue is larger, it will be more likely that a client
+that reconnects will be able to verify its cache using the queue.  On
+the other hand, a large queue uses more memory on the server to store
+the message.  Invalidation messages tend to be small, perhaps a few
+hundred bytes each on average; it depends on the number of objects
+modified by a transaction.
+
+Transaction timeouts
+~~~~~~~~~~~~~~~~~~~~
+
+A ZEO server can be configured to timeout a transaction if it takes
+too long to complete.  Only a single transaction can commit at a time;
+so if one transaction takes too long, all other clients will be
+delayed waiting for it.  In the extreme, a client can hang during the
+commit process.  If the client hangs, the server will be unable to
+commit other transactions until it restarts.  A well-behaved client
+will not hang, but the server can be configured with a transaction
+timeout to guard against bugs that cause a client to hang.
+
+If any transaction exceeds the timeout threshold, the client's
+connection to the server will be closed and the transaction aborted.
+Once the transaction is aborted, the server can start processing other
+client's requests.  Most transactions should take very little time to
+commit.  The timer begins for a transaction after all the data has
+been sent to the server.  At this point, the cost of commit should be
+dominated by the cost of writing data to disk; it should be unusual
+for a commit to take longer than 1 second.  A transaction timeout of
+30 seconds should tolerate heavy load and slow communications between
+client and server, while guarding against hung servers.
+
+When a transaction times out, the client can be left in an awkward
+position.  If the timeout occurs during the second phase of the two
+phase commit, the client will log a panic message.  This should only
+cause problems if the client transaction involved multiple storages.
+If it did, it is possible that some storages committed the client
+changes and others did not.
+
+Monitor server
+~~~~~~~~~~~~~~
+
+The ZEO server updates several counters while it is running.  It can
+be configured to run a separate monitor server that reports the
+counter values and other statistics.  If a client connects to the
+socket, the server send a text report and close the socket
+immediately.  It does not read any data from the client.
+
+An example of a monitor server report is included below::
+
+    ZEO monitor server version 2.1a1
+    Fri Apr  4 16:57:42 2003
+    
+    Storage: 1
+    Server started: Fri Apr  4 16:57:37 2003
+    Clients: 0
+    Clients verifying: 0
+    Active transactions: 0
+    Commits: 0
+    Aborts: 0
+    Loads: 0
+    Stores: 0
+    Conflicts: 0
+    Conflicts resolved: 0
+
+Connection management
+~~~~~~~~~~~~~~~~~~~~~
+
+A ZEO client manages its connection to the ZEO server.  If it loses
+the connection, it attempts to reconnect.  While
+it is disconnected, it can satisfy some reads by using its cache.
+
+The client can be configured to wait for a connection when it is created
+or to return immediately and provide data from its persistent cache.
+It usually simplifies programming to have the client wait for a
+connection on startup.
+
+When the client is disconnected, it polls periodically to see if the
+server is available.  The rate at which it polls is configurable.
+
+The client can be configured with multiple server addresses.  In this
+case, it assumes that each server has identical content and will use
+any server that is available.  It is possible to configure the client
+to accept a read-only connection to one of these servers if no
+read-write connection is available.  If it has a read-only connection,
+it will continue to poll for a read-write connection.  This feature
+supports the Zope Replication Services product,
+http://www.zope.com/Products/ZopeProducts/ZRS.  In general, it could
+be used to with a system that arranges to provide hot backups of
+servers in the case of failure.
+
+Authentication
+~~~~~~~~~~~~~~
+
+ZEO supports optional authentication of client and server using a
+password scheme similar to HTTP digest authentication (RFC 2069).  It
+is a simple challenge-response protocol that does not send passwords
+in the clear, but does not offer strong security.  The RFC discusses
+many of the limitations of this kind of protocol.  Note that this
+feature provides authentication only.  It does not provide encryption
+or confidentiality.
+
+The challenge-response also produces a session key that is used to
+generate message authentication codes for each ZEO message.  This
+should prevent session hijacking.
+
+Guard the password database as if it contained plaintext passwords.
+It stores the hash of a username and password.  This does not expose
+the plaintext password, but it is sensitive nonetheless.  An attacker
+with the hash can impersonate the real user.  This is a limitation of
+the simple digest scheme.
+
+The authentication framework allows third-party developers to provide
+new authentication modules.
+
+Installing software
+-------------------
+
+ZEO is distributed as part of the ZODB3 package and with Zope,
+starting with Zope 2.7.  You can download it from
+http://pypi.python.org/pypi/ZODB3.
+
+Configuring server
+------------------
+
+The script runzeo.py runs the ZEO server.  The server can be
+configured using command-line arguments or a config file.  This
+document only describes the config file.  Run runzeo.py
+-h to see the list of command-line arguments.
+
+The runzeo.py script imports the ZEO package.  ZEO must either be
+installed in Python's site-packages directory or be in a directory on
+PYTHONPATH.  
+
+The configuration file specifies the underlying storage the server
+uses, the address it binds, and a few other optional parameters.
+An example is::
+
+    <zeo>
+    address zeo.example.com:8090
+    monitor-address zeo.example.com:8091
+    </zeo>
+
+    <filestorage 1>
+    path /var/tmp/Data.fs
+    </filestorage>
+
+    <eventlog>
+    <logfile>
+    path /var/tmp/zeo.log
+    format %(asctime)s %(message)s
+    </logfile>
+    </eventlog>
+
+This file configures a server to use a FileStorage from
+/var/tmp/Data.fs.  The server listens on port 8090 of zeo.example.com.
+It also starts a monitor server that lists in port 8091.  The ZEO
+server writes its log file to /var/tmp/zeo.log and uses a custom
+format for each line.  Assuming the example configuration it stored in
+zeo.config, you can run a server by typing::
+
+    python /usr/local/bin/runzeo.py -C zeo.config
+
+A configuration file consists of a <zeo> section and a storage
+section, where the storage section can use any of the valid ZODB
+storage types.  It may also contain an eventlog configuration.  See
+the document "Configuring a ZODB database" for more information about
+configuring storages and eventlogs.
+
+The zeo section must list the address.  All the other keys are
+optional.
+
+address
+        The address at which the server should listen.  This can be in
+        the form 'host:port' to signify a TCP/IP connection or a
+        pathname string to signify a Unix domain socket connection (at
+        least one '/' is required).  A hostname may be a DNS name or a
+        dotted IP address.  If the hostname is omitted, the platform's
+        default behavior is used when binding the listening socket (''
+        is passed to socket.bind() as the hostname portion of the
+        address).
+
+read-only
+        Flag indicating whether the server should operate in read-only
+        mode.  Defaults to false.  Note that even if the server is
+        operating in writable mode, individual storages may still be
+        read-only.  But if the server is in read-only mode, no write
+        operations are allowed, even if the storages are writable.  Note
+        that pack() is considered a read-only operation.
+
+invalidation-queue-size
+        The storage server keeps a queue of the objects modified by the
+        last N transactions, where N == invalidation_queue_size.  This
+        queue is used to speed client cache verification when a client
+        disconnects for a short period of time.
+
+monitor-address
+        The address at which the monitor server should listen.  If
+        specified, a monitor server is started.  The monitor server
+        provides server statistics in a simple text format.  This can
+        be in the form 'host:port' to signify a TCP/IP connection or a
+        pathname string to signify a Unix domain socket connection (at
+        least one '/' is required).  A hostname may be a DNS name or a
+        dotted IP address.  If the hostname is omitted, the platform's
+        default behavior is used when binding the listening socket (''
+        is passed to socket.bind() as the hostname portion of the
+        address).
+
+transaction-timeout
+        The maximum amount of time to wait for a transaction to commit
+        after acquiring the storage lock, specified in seconds.  If the
+        transaction takes too long, the client connection will be closed
+        and the transaction aborted.
+
+authentication-protocol
+        The name of the protocol used for authentication.  The
+        only protocol provided with ZEO is "digest," but extensions
+        may provide other protocols.
+
+authentication-database
+        The path of the database containing authentication credentials.
+
+authentication-realm
+        The authentication realm of the server.  Some authentication
+        schemes use a realm to identify the logic set of usernames
+        that are accepted by this server.
+
+Configuring clients
+-------------------
+
+The ZEO client can also be configured using ZConfig.  The ZODB.config
+module provides several function for opening a storage based on its
+configuration.
+
+- ZODB.config.storageFromString()
+- ZODB.config.storageFromFile()
+- ZODB.config.storageFromURL()
+
+The ZEO client configuration requires the server address be
+specified.  Everything else is optional.  An example configuration is::
+
+    <zeoclient>
+    server zeo.example.com:8090
+    </zeoclient>
+
+The other configuration options are listed below.
+
+storage
+        The name of the storage that the client wants to use.  If the
+        ZEO server serves more than one storage, the client selects
+        the storage it wants to use by name.  The default name is '1',
+        which is also the default name for the ZEO server.
+
+cache-size
+        The maximum size of the client cache, in bytes.
+
+name
+        The storage name.  If unspecified, the address of the server
+        will be used as the name.
+
+client
+        Enables persistent cache files.  The string passed here is
+        used to construct the cache filenames.  If it is not
+        specified, the client creates a temporary cache that will
+        only be used by the current object.
+
+var
+        The directory where persistent cache files are stored.  By
+        default cache files, if they are persistent, are stored in 
+        the current directory.
+
+min-disconnect-poll
+        The minimum delay in seconds between attempts to connect to
+        the server, in seconds.  Defaults to 5 seconds.
+
+max-disconnect-poll
+        The maximum delay in seconds between attempts to connect to
+        the server, in seconds.  Defaults to 300 seconds.
+
+wait
+        A boolean indicating whether the constructor should wait
+        for the client to connect to the server and verify the cache
+        before returning.  The default is true.
+
+read-only
+        A flag indicating whether this should be a read-only storage,
+        defaulting to false (i.e. writing is allowed by default).
+
+read-only-fallback
+        A flag indicating whether a read-only remote storage should be
+        acceptable as a fallback when no writable storages are
+        available.  Defaults to false.  At most one of read_only and
+        read_only_fallback should be true.
+realm
+        The authentication realm of the server.  Some authentication
+        schemes use a realm to identify the logic set of usernames
+        that are accepted by this server.
+
+A ZEO client can also be created by calling the ClientStorage
+constructor explicitly.  For example::
+
+    from ZEO.ClientStorage import ClientStorage
+    storage = ClientStorage(("zeo.example.com", 8090))
+
+Running the ZEO server as a daemon
+----------------------------------
+
+In an operational setting, you will want to run the ZEO server a
+daemon process that is restarted when it dies.  The zdaemon package
+provides two tools for running daemons: zdrun.py and zdctl.py. You can
+find zdaemon and it's documentation at
+http://pypi.python.org/pypi/zdaemon.
+
+Rotating log files
+~~~~~~~~~~~~~~~~~~
+
+ZEO will re-initialize its logging subsystem when it receives a
+SIGUSR2 signal.  If you are using the standard event logger, you
+should first rename the log file and then send the signal to the
+server.  The server will continue writing to the renamed log file
+until it receives the signal.  After it receives the signal, the
+server will create a new file with the old name and write to it.
+
+Tools
+-----
+
+There are a few scripts that may help running a ZEO server.  The
+zeopack.py script connects to a server and packs the storage.  It can
+be run as a cron job.  The zeoup.py script attempts to connect to a
+ZEO server and verify that is is functioning.  The zeopasswd.py script
+manages a ZEO servers password database.
+
+Diagnosing problems
+-------------------
+
+If an exception occurs on the server, the server will log a traceback
+and send an exception to the client.  The traceback on the client will
+show a ZEO protocol library as the source of the error.  If you need
+to diagnose the problem, you will have to look in the server log for
+the rest of the traceback.



More information about the Zodb-checkins mailing list