[ZODB-Dev] ZODB crawler script

Greg Ward gward@mems-exchange.org
Mon, 6 Aug 2001 17:01:36 -0400


--C7zPtVaVf+AK4Oqc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi all --

about a month ago, I mentioned my script "zodb_census" here.  I may have
even posted it, but I don't remember.  Anyways, it's a script to loop
over all objects in a ZODB without knowing anything about the
application object graph, and without having to write a tricky generic
graph-traversal algorithm.  (The point of the loop is to count how many
objects of each type there are in the database.)

The idea is this:

  oid = 0
  while 1:
    oid_s = stringify(oid)
    object = connection[oid_s]
    if seen_all_objects:
      break
    oid += 1

The script as described/posted was broken: it got the stringify step
wrong.  (Damn bit-twiddling.)  To undo the damage this may have caused
my reputation, I'm attaching a working version of the script.  ;-)

Here's sample output:

  $ ./zodb_census.py
  expecting to see 132952 objects
  maximum expected OID: 00000000000639b9
  OID: 00000000000639ba (objects seen: 132952)
  census completed
  total OIDs attempted: 407994
  empty slots seen: 275042
  actual objects seen: 132952
  objects seen by type:
  ActiveVersionCollection          848
  Address                         1454
  BTree                             13
  BaseProcess                     1895
  Bucket                           197
  [...]
  Wafer                           9386
  WaferDescription                1124
  WorkIncrementCostModel            80
  WorkRateCostModel                123

Should work with any storage, but the "maximum expected OID" line
(mainly a debugging aid) is only printed for FileStorage.

zodb_census.py won't work out of the box for you, unless your email
address also happens to end in "@mems-exchange.org".  Replacing our
init_database(), get_connection(), and get_database() functions is an
exercise left for the reader.

Enjoy --

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org

--C7zPtVaVf+AK4Oqc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="zodb_census.py"

#!/www/python/bin/python

"""zodb_census

Inspect every object in a ZODB and count how many times each type
occurs.  (Note that each ExtensionClass is a separate type, so
we'll get a class-by-class breakdown from this.)
"""

__revision__ = "$Id: zodb_census.py,v 1.3 2001/08/06 20:48:11 gward Exp $"

import sys
from struct import pack, unpack
from mems.lib.base import init_database, get_connection, get_database

def write_status (oid, objects_seen):
    sys.stdout.write("\rOID: %016x (objects seen: %d)" %
                     (oid, objects_seen))
    sys.stdout.flush()


init_database()
conn = get_connection()
oid = 0L
empty_slots = 0L
objects_seen = 0L                       # number of actual objects seen
object_count = {}                       # maps type name to count

expected_count = get_database().objectCount()
print "expecting to see %d objects" % expected_count

try:
    # Only works for FileStorage, but not necessary since it's
    # only done for curiosity.
    max_oid = unpack(">LL", conn._storage._oid) # returns (long, long) tuple
    print "maximum expected OID: %08x%08x" % max_oid
    max_oid = (max_oid[0] << 32) | max_oid[1]
except AttributeError:
    max_oid = None

try:
    while 1:
        if (oid % 0x0800) == 0:
            write_status(oid, objects_seen)

        oid_s = pack(">LL",
                     (oid & 0xffffffff00000000L) >> 32,
                     (oid & 0x00000000ffffffffL))
        try:
            object = conn[oid_s]
        except KeyError:
            #print "%016x  *empty slot*" % oid
            empty_slots += 1
        else:
            #print "%016x  %s" % (oid, `object`)
            typename = type(object).__name__
            if object_count.has_key(typename):
                object_count[typename] += 1
            else:
                object_count[typename] = 1L
            objects_seen += 1

        oid += 1
        if (objects_seen >= expected_count) or (max_oid and oid > max_oid):
            write_status(oid, objects_seen)
            print "\ncensus completed"
            break
        
except KeyboardInterrupt:
    write_status(oid, objects_seen)
    print "\ncensus interrupted prematurely"

print "total OIDs attempted: %d" % oid
print "empty slots seen: %d" % empty_slots
print "actual objects seen: %d" % objects_seen
typenames = object_count.keys()
typenames.sort()
print "objects seen by type:"
for name in typenames:
    print "%-25.25s %10d" % (name, object_count[name])

--C7zPtVaVf+AK4Oqc--