[ZODB-Dev] RE: [Zope] Analyzing ZODB objects

Dieter Maurer dieter at handshake.de
Sun Oct 26 03:22:06 EST 2003


Bjorn Stabell wrote at 2003-10-24 10:52 +0800:
 > > From: Dieter Maurer [mailto:dieter at handshake.de] 
 > [...]
 > > They can be far apart. Although, when your pickle is several 
 > > MB your object is not several bytes and vice versa.
 > 
 > Well, in that case it might be useful for such a ZODB admin tool to show
 > both sizes.  It could be a combined cache analysis and ZODB browsing
 > tool.

Python does not know how much memory is used by a Python object.
Python was designed to hide such implementation details.

More importantly, even when it would know the size of a single
object, this would be (almost) irrelevant for your purpose.
You want to know the size occupied by a complete persistent object
(including its non-persistent subobjects but excluding persistent
subobjects). The easiest way to get this information would be
to extend pickling and accumulate the size during object load.
However, Python is far too flexible (it allows C extensions
to control pickling of their instances) to get that easily.
Lot's of C extensions would have to be modified.

Forget about this approach. It might come with Python 3 or
Python 4, but it is unlikely. Python is a high level language
hiding memory usage; you want precise information about
memory usage. I doubt you will find enough arguments and
use cases to get this into Python.

 > [...]
 > > I do not yet understand why you would want such a thing.
 > > Can you provide use cases?
 > 
 > I guess I want something very low-level, for use in debugging strange
 > behavior, and for help in understanding how Zope apps are built.  The
 > ZMI works with object interfaces, which is useful, but requires that
 > each object supports an interface (ObjectManager etc).  Many objects
 > don't, especially not when you're developing them :)  For this
 > reason--no access to data except through application-provided
 > interfaces--ZODB feels much like a "black box" to me.

Do you really care about the size of objects in memory?
We no longer live in 1980 when memory has been a scarce resource.

If you care about more Pythonic aspects (attributes, methods, size
of dictionaries, lists, ...), then you can read a "HowTo"
about "Debugging Zope" (--> "Zope.org"). You can access
each object in the ZODB and use Python's inspection
facilities ("--> Python Library Reference") to analyse its
attributes and methods and call its methods interactively
(to find out about application specific things).

 > Example use cases include:
 > 
 > - SPACE AND MEMORY OPTIMIZATION. Reducing ZODB size, and Zope memory
 > usage.  I've got some huge objects in my database, which ones are they?
 > Why are they huge?  If I know this, I can optimize.

I had a similar problem (the ZODB grew far too fast) and I wanted
to understand why.

I extended Zope's "Undo" information to include the transaction size.
This allowed me to see precisely which transactions were larger
than expected.

I extended the "fsdump" utility to include the (pickle) sizes
of the object records contained in a transaction and to restrict
the range of dumped transactions.

This has been enough to analyse the problem: ZCatalog's Metadata
records caused a transaction size to grow from an expected few hundred bytes
to about 500 kB.

You can use the same approach to analyse ZODB size problems.


Note, that I do not care about memory size. Usually, a persistent
object uses memory in the same order as its pickle size (there
are exceptions, but they can usually be ignored). With a GB RAM
costing about 200 USD, we can ignore the differences between
memory size and pickle size.


 > (Related: My Zope
 > uses a lot of memory; why?  Objects of which class, and in which
 > location, uses the most memory?  Why were they loaded? At the ZMI level,
 > you don't want to know if objects are loaded/ghosted.)

When your Zope uses a lot of memory, then either you have large numbers
of large persistent objects in the ZODB caches (see "Control_Panel -->
Database Management" about information of your ZODB caches)
or you have memory leaks.

Large persistent objects are revealed by large transactions (when written).
You can use the above mentioned techniques to analyse transaction
sizes and pickle sizes.

Memory leaks can be spotted via "Control_Panel --> Debug Information
--> Reference Counts" or Shane's "LeakFinder" product.

 > - UNDERSTANDING CONTENT TYPES AND TOOLS.  What is the difference in the
 > data structure of PloneDocument and CMFDocument?

Use a debugger, "DocFinder", the sources.

 > - DEBUGGING CLASS MIGRATION PROBLEMS.  Some older objects are exhibiting
 > strange behavior; what is the difference in data structure between them
 > and the new objects?  Which objects of class X doesn't have attribute A
 > set?  (Except for the obvious Zope/CMF/Plone/Product upgrades, these
 > kinds of problems happen a lot during development each time a class is
 > changed without recreating its objects.)

Use a debugger, the sources.

 > - VERIFYING AND IMPROVING DATA STORAGE SCHEMAS.  Does class X really
 > store the attributes I thought it would?

In Python (at least until 2.2), attributes are usually stored in
instances not in classes. This makes such an analyis quite difficult.
You have to look at the sources or maybe lots of instances and
perform (unsafe) inductional reasoning.

 > - CHANGING A PROPERTY THAT DOESN'T HAVE A MANAGEMENT INTERFACE.  For
 > debugging, testing, or migration purposes, or for just fixing a one-off
 > bug.

Use a debugger.

 > The truth is, I think this kind of tool will "open my eyes" to what's in
 > the ZODB and take much of the guesswork out of developing with Zope,
 > similar to the eye opening experience a RDBMS admin tool is.

I do *not* think that the admin tool gives you precise information
about memory size. This is too low a level, also for a relational database.


Apart from that (memory size),
you can already now use a debugger to get the information you want.
You can use Python's inspection facility to find out the relevant
information and implement an UI for this.
My "DocFinder" product does this for class attributes and methods (but not
instance attributes). Look at it when you need an extension for
instance attributes.

  <http://www.dieter.handshake.de/pyprojects/zope>




 > I often
 > write scripts to analyze object structure and do simple changes; I wish,
 > however, that there would be an admin environment that provided:
 > 
 > - A database browsing / object inspector tool, taking away the need to
 > write scripts for browsing/changing objects in most cases, and
 > encouraging people to analyze and understand the database structure

It do not write scripts for this but use the debugger interactively.

 > - A query language that would make it even easier to write scripts (+
 > ZODB index support?)

Use ZCatalog and its query language.

 > - A place to store one-off scripts so they don't get mixed up with the
 > application

A "Folder", when you really need the scripts.

 > Something like this http://www.pgadmin.org/pgadmin3/screenshots.php, but
 > for the ZODB.

A relational database has only a few classes ("sequence", "table", "index",
"constraint", ...) and all these classes are known to the framework.

The ZODB in contrast has an unbounded number of classes unknown to
the framework. It is far more flexible than the relational framework.
Therefore, it is much more difficult to provide (useful) inspection
facilities.


You can get low level inspection the way I outlined above
(extend "DocFinder" for instance attributes). I do not think
it is useful (therefore, I have not implemented it for "DocFinder")
but you may see it differently.

-- 
Dieter



More information about the ZODB-Dev mailing list