[ZODB-Dev] Re: [Dev] ZODB is not a Storage Technology (Re: other formats )
John Anderson
john@osafoundation.org
Sat, 09 Nov 2002 09:09:49 -0800
Thanks for the very nice overview. Makes lots of sense and it will help
me as we jump into the code. I did have one question, see below
John
Mike C. Fletcher wrote:
> Okay, here's a quick overview of the guts, presented as an outline.
> I've assumed you'll be reading the summaries with the source-code open
> in another window to see what's being described, so I've not gone into
> any details as to how anything is done.
>
> The objects likely best to concentrate on for understanding the
> low-level guts are the FileStorage, the Connection, and the
> _defaulttransaction. I've given you quick summaries of what you'll
> find in most of the files in the ZODB4 CVS packages (ZODB, Transaction
> and Persistence), the zLOG project is just logging facilities, nothing
> really close to the core of the ZODB. The indentation is primarily
> showing usage patterns (for instance, fsindex is really only used by
> FileStorage AFAIK), though I've also used it to group items which can
> be considered sub-categories of the superior item.
>
> I'll work on details tomorrow if I can get some more time,
> questions/directions in which you'd like more coverage quite welcome.
> BTW: I've copied the ZODB-dev list so that others can correct anything
> I've messed up, or add anything that they consider critical to
> understanding the system.
>
> Enjoy,
> Mike
>
> ZODB:
> Storage (BaseStorage sub-classes):
> """Storages are responsible for maintaining object state records
>
> They can also maintain undo (transaction) and versional records.
> """
> FileStorage:
> """Default ZODB storage
>
> The FileStorage is a linear aggregate of all transactions,
> and transactions are aggregates of all changed objects.
> Transactions are added at the end of the file, with
> later changes to a particular object conceptually overwriting
> the earlier changes.
>
> Versions (personal views of the dbase) are just transactions
> which are declared to have version information. The versions
> form linked lists (they point to the last transaction in the
> version).
>
> Storages which have undo support (such as filestorage) have
> a pack method which basically copies all objects forward until
> there is a single current set. Then discards anything not in
> the current set.
Does it copy "in place" so that if you pulled the plug while in pack
your file is corrupted?
>
> """
> fsIndex:
> """Index from persistent OID -> file position index
> The fsIndex provides optimised index to
> individual objects
> within the data file of the FileStorage. The index can
> be rebuilt merely be scanning through the entire datafile.
> """
> TmpStore:
> """Storage for transaction save-points"""
> DBMStorage:
> """Simple storage based on GDBM/AnyDBM"""
> MappingStorage:
> """A demonstration of a volatile in-memory storage"""
>
> utility mechanisms:
> TimeStamp:
> """TimeStamp C exetension type"""
> Serialize:
> """Pickle-like storage (cPickle plus some custom code)"""
> referencesf:
> """finds object refs in pickle strings"""
> file_lock:
> """(small) wrapper to do cross-platform locking of
> files"""
> fsdump, fsrecover:
> """Debugging/utility code"""
>
> Connection:
> """Object-space in which application objects live
>
> Uses an in-memory object-cache (see below)
>
> Provides object-access (get root dict, get object by oid)
> though normal access is via getting root and then
> drilling down through the object references.
>
> Other than this, almost the entire class is support
> for the transaction and persistence mechanisms.
> """
> ExportImport:
> """Mix-in providing XML import/export"""
> DB:
> """Manages multiple Connections to a storage
>
> Provides a pool of connections
> Provides mechanisms for applying functions
> to all object caches in all connections
> Tracks object modifications for versions? (not
> sure about this, I've never used versions)
>
> Provides most of the primitives on which Connection and
> Transaction build the transaction mechanism. (tpc_*)
> """
>
>
> Transaction:
> _defaultTransaction:
> """The default transaction machinery
>
> Combined with the connection object, this is most
> of the transaction-driving code in the system. It
> is fairly tightly coupled to the Persistent module
> (e.g. it assumes _p_jar and the like on all registered
> objects).
> """
> Transaction:
> """Data-storage for the current transaction"""
> Manager:
> """Entry point for transaction APIs"""
>
> Persistence:
> _persistent:
> """Python 2.2.2 implementation of IPersistent
>
> Basically, this is a Pure-python version of the cPersistence
> code that really gets used (I'm not sure if there's code
> anywhere to fall back to using this version if the cPersistence
> code isn't compiled).
>
> This is quite useful for figuring out what's going on,
> but (having used it for a few months), it seemed too slow
> to be of use in a real-world system (too much time spent in
> __getattribute__).
> """
> cPersistence:
> """Provides optimised IPersistent implementation"""
>
> Cache:
> """Provides an in-memory object cache to reduce reloads from disk
>
> Basically this is a high-level cache, it has a target size
> and a few methods implementing garbage collection. The
> DB calls the connection's GC methods, then the connection calls
> it's cache's GC methods.
> """
>
> particular data-types:
> PersistentDict, PersistentList:
> """Dictionary and List types which track their changes
>
> Basically allow you to use them as lists/dicts without
> needing to spend code tracking changes yourself. These
> items, however, re-store the entire list/dict on each
> save, so see BTree for large dicts.
> """
> BTrees:
> """BTree implementation using individually persistent nodes
>
> Allows large dictionaries to be stored so that only a small
> sub-set of the dictionary needs to be re-stored on
> modifications
> """
> Function, Module, Package:
> """References to these types w/ importing
>
> Never used these myself (I think they're new),
> they appear to store name-references, or actual
> code objects in the case of functions.
> """
>
>
>
> John Anderson wrote:
>
>> I'd be interested in an overview of the guts. Start with a big
>> picture, then move into some details and describe what's in which
>> files. I'd like to eventually learn the code base so I can decide how
>> to improve it.
>>
>> John
>>
>> Mike C. Fletcher wrote:
>>
>>> At what level would you like the description (I've been using ZODB
>>> for years now, and have just released a calendaring application on
>>> it). I assume you understand the basics, so are you looking for
>>> analysis of where/how it starts to fail/how to update it, or what
>>> the actual machinery inside is doing for any given action?
>>>
>>> I'll push some time around and try to get a description posted this
>>> weekend if you can tell me which area you need.
>>>
>>> Enjoy,
>>> Mike
>>>
> ...
>