[ZODB-Dev] ZODB crawler script
Greg Ward
gward@mems-exchange.org
Mon, 6 Aug 2001 17:01:36 -0400
--C7zPtVaVf+AK4Oqc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hi all --
about a month ago, I mentioned my script "zodb_census" here. I may have
even posted it, but I don't remember. Anyways, it's a script to loop
over all objects in a ZODB without knowing anything about the
application object graph, and without having to write a tricky generic
graph-traversal algorithm. (The point of the loop is to count how many
objects of each type there are in the database.)
The idea is this:
oid = 0
while 1:
oid_s = stringify(oid)
object = connection[oid_s]
if seen_all_objects:
break
oid += 1
The script as described/posted was broken: it got the stringify step
wrong. (Damn bit-twiddling.) To undo the damage this may have caused
my reputation, I'm attaching a working version of the script. ;-)
Here's sample output:
$ ./zodb_census.py
expecting to see 132952 objects
maximum expected OID: 00000000000639b9
OID: 00000000000639ba (objects seen: 132952)
census completed
total OIDs attempted: 407994
empty slots seen: 275042
actual objects seen: 132952
objects seen by type:
ActiveVersionCollection 848
Address 1454
BTree 13
BaseProcess 1895
Bucket 197
[...]
Wafer 9386
WaferDescription 1124
WorkIncrementCostModel 80
WorkRateCostModel 123
Should work with any storage, but the "maximum expected OID" line
(mainly a debugging aid) is only printed for FileStorage.
zodb_census.py won't work out of the box for you, unless your email
address also happens to end in "@mems-exchange.org". Replacing our
init_database(), get_connection(), and get_database() functions is an
exercise left for the reader.
Enjoy --
Greg
--
Greg Ward - software developer gward@mems-exchange.org
MEMS Exchange http://www.mems-exchange.org
--C7zPtVaVf+AK4Oqc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="zodb_census.py"
#!/www/python/bin/python
"""zodb_census
Inspect every object in a ZODB and count how many times each type
occurs. (Note that each ExtensionClass is a separate type, so
we'll get a class-by-class breakdown from this.)
"""
__revision__ = "$Id: zodb_census.py,v 1.3 2001/08/06 20:48:11 gward Exp $"
import sys
from struct import pack, unpack
from mems.lib.base import init_database, get_connection, get_database
def write_status (oid, objects_seen):
sys.stdout.write("\rOID: %016x (objects seen: %d)" %
(oid, objects_seen))
sys.stdout.flush()
init_database()
conn = get_connection()
oid = 0L
empty_slots = 0L
objects_seen = 0L # number of actual objects seen
object_count = {} # maps type name to count
expected_count = get_database().objectCount()
print "expecting to see %d objects" % expected_count
try:
# Only works for FileStorage, but not necessary since it's
# only done for curiosity.
max_oid = unpack(">LL", conn._storage._oid) # returns (long, long) tuple
print "maximum expected OID: %08x%08x" % max_oid
max_oid = (max_oid[0] << 32) | max_oid[1]
except AttributeError:
max_oid = None
try:
while 1:
if (oid % 0x0800) == 0:
write_status(oid, objects_seen)
oid_s = pack(">LL",
(oid & 0xffffffff00000000L) >> 32,
(oid & 0x00000000ffffffffL))
try:
object = conn[oid_s]
except KeyError:
#print "%016x *empty slot*" % oid
empty_slots += 1
else:
#print "%016x %s" % (oid, `object`)
typename = type(object).__name__
if object_count.has_key(typename):
object_count[typename] += 1
else:
object_count[typename] = 1L
objects_seen += 1
oid += 1
if (objects_seen >= expected_count) or (max_oid and oid > max_oid):
write_status(oid, objects_seen)
print "\ncensus completed"
break
except KeyboardInterrupt:
write_status(oid, objects_seen)
print "\ncensus interrupted prematurely"
print "total OIDs attempted: %d" % oid
print "empty slots seen: %d" % empty_slots
print "actual objects seen: %d" % objects_seen
typenames = object_count.keys()
typenames.sort()
print "objects seen by type:"
for name in typenames:
print "%-25.25s %10d" % (name, object_count[name])
--C7zPtVaVf+AK4Oqc--