[Zodb-checkins] CVS: ZODB3/Doc/ZEO - trace.txt:1.1

Jeremy Hylton jeremy at zope.com
Tue May 27 17:11:44 EDT 2003


Update of /cvs-repository/ZODB3/Doc/ZEO
In directory cvs.zope.org:/tmp/cvs-serv5512/Doc/ZEO

Added Files:
	trace.txt 
Log Message:
Move cache trace documentation to doc directory.


=== Added File ZODB3/Doc/ZEO/trace.txt ===
Client Cache Tracing
====================

An important question for ZEO users is: how large should the ZEO
client cache be?  ZEO 2 (as of ZEO 2.0b2) has a new feature that lets
you collect a trace of cache activity and tools to analyze this trace,
enabling you to make an informed decision about the cache size.

Don't confuse the ZEO client cache with the Zope object cache.  The
ZEO client cache is only used when an object is not in the Zope object
cache; the ZEO client cache avoids roundtrips to the ZEO server.

Enabling Cache Tracing
----------------------

To enable cache tracing, set the environment variable ZEO_CACHE_TRACE
to the name of a file to which the ZEO client process can write.  ZEO
will append a hyphen and the storage name to the filename, to
distinguish different storages.  If the file doesn't exist, the ZEO
will try to create it.  If there are problems with the file, a log
message is written to the standard Zope log file.  To start or stop
tracing, the ZEO client process (typically a Zope application server)
must be restarted.

The trace file can grow pretty quickly; on a moderately loaded server,
we observed it growing by 5 MB per hour.  The file consists of binary
records, each 24 bytes long; a detailed description of the record
lay-out is given in stats.py.  No sensitive data is logged.

Analyzing a Cache Trace
-----------------------

The stats.py command-line tool is the first-line tool to analyze a
cache trace.  Its default output consists of two parts: a one-line
summary of essential statistics for each segment of 15 minutes,
interspersed with lines indicating client restarts and "cache flip
events" (more about those later), followed by a more detailed summary
of overall statistics.

The most important statistic is probably the "hit rate", a percentage
indicating how many requests to load an object could be satisfied from
the cache.  Hit rates around 70% are good.  90% is probably close to
the theoretical maximum.  If you see a hit rate under 60% you can
probably improve the cache performance (and hence your Zope
application server's performance) by increasing the ZEO cache size.
This is normally configured using the cache_size keyword argument to
the ClientStorage() constructor in your custom_zodb.py file.  The
default cache size is 20 MB.

The stats.py tool shows its command line syntax when invoked without
arguments.  The tracefile argument can be a gzipped file if it has a
.gz extension.  It will read from stdin (assuming uncompressed data)
if the tracefile argument is '-'.

Simulating Different Cache Sizes
--------------------------------

Based on a cache trace file, you can make a prediction of how well the
cache might do with a different cache size.  The simul.py tool runs an
accurate simulation of the ZEO client cache implementation based upon
the events read from a trace file.  A new simulation is started each
time the trace file records a client restart event; if a trace file
contains more than one restart event, a separate line is printed for
each simulation, and line with overall statistics is added at the end.

Example, assuming the trace file is in /tmp/cachetrace.log::

    $ python simul.py -s 100 /tmp/cachetrace.log
      START TIME  DURATION    LOADS     HITS INVALS WRITES  FLIPS HITRATE
    Sep  4 11:59     38:01    59833    40473    257     20      2  67.6%
    $

This shows that with a 100 MB cache size, the cache hit rate is
67.6%.  So let's try this again with a 200 MB cache size::

    $ python simul.py -s 200 /tmp/cachetrace.log
      START TIME  DURATION    LOADS     HITS INVALS WRITES  FLIPS HITRATE
    Sep  4 11:59     38:01    59833    40921    258     20      1  68.4%
    $

This showed hardly any improvement.  So let's try a 300 MB cache
size::

    $ python2.0 simul.py -s 300 /tmp/cachetrace.log
    ZEOCacheSimulation, cache size 300,000,000 bytes
      START TIME  DURATION    LOADS     HITS INVALS WRITES  FLIPS HITRATE
    Sep  4 11:59     38:01    59833    40921    258     20      0  68.4%
    $ 

This shows that for this particular trace file, the maximum attainable
hit rate is 68.4%.  This is probably caused by the fact that nearly a
third of the objects mentioned in the trace were loaded only once --
the cache only helps if an object is loaded more than once.

The simul.py tool also supports simulating different cache
strategies.  Since none of these are implemented, these are not
further documented here.

Cache Flips
-----------

The cache uses two files, which are managed as follows:

  - Data are written to file 0 until file 0 exceeds limit/2 in size.

  - Data are written to file 1 until file 1 exceeds limit/2 in size.

  - File 0 is truncated to size 0 (or deleted and recreated).

  - Data are written to file 0 until file 0 exceeds limit/2 in size.

  - File 1 is truncated to size 0 (or deleted and recreated).

  - Data are written to file 1 until file 1 exceeds limit/2 in size.

and so on.

A switch from file 0 to file 1 is called a "cache flip".  At all cache
flips except the first, half of the cache contents is wiped out.  This
affects cache performance.  How badly this impact is can be seen from
the per-15-minutes summaries printed by stats.py.  The -i option lets
you choose a smaller summary interval which shows the impact more
acutely.

The simul.py tool shows the number of cache flips in the FLIPS column.
If you see more than one flip per hour the cache may be too small.




More information about the Zodb-checkins mailing list