[ZODB-Dev] High Write Applications

Sat Aug 2 17:49:21 EDT 2003

At 03:32 PM 8/2/03 +0100, Chris Withers wrote:
>Phillip J. Eby wrote:
>>The root causes is that RDBMS transaction logs record high-level details, 
>>not blob snapshots.  For example, when an RDBMS logs that you changed row 
>>#2967's "foo" column to value "bar", it doesn't usually also log the 
>>entire data page the row was contained in, plus copies of all the b-tree 
>>index pages that changed as a consequence of the change.
>>Thus, ZODB's disk usage per write transaction generally exceeds RDBMS 
>>disk usage for the same transaction by at *least* an order of magnitude, 
>>even before catalog indexes come into play.
>
>oh :-(
>
>>To do this, ZODB would have to be able to understand and log 
>>*differences*, rather than just snapshotting object states.
>
>How would it go about doing this?

If I could answer that question, I'd have suggested Jim implement it years 
ago.  :)

>>And, it would need to be able to manage periodic checkpointing, so that 
>>recovering a database wouldn't require rerunning all the transactions 
>>that had ever been done on it.
>
>Hmmm, I'm no expert in this kind of thing. How does periodic checkpointing 
>work?

It just means that there needs to be a consistent snapshot of the entire 
database available on disk.  Last I looked, ZODB file-storage did this by 
using a "quick-load" index file with pointers into the transaction log for 
the current version of each object.  BerkeleyDB does checkpoints when asked 
to; this basically amounts to simply ensuring that all dirty DB pages in 
memory are written to disk and a filesystem sync() is performed.

>>By the way, if you've looked at systems like Prevayler,
>
>I haven't ;-) Where can I read more?

Google for "Prevayler".