[ZODB-Dev] High Write Applications
Phillip J. Eby
pje at telecommunity.com
Mon Aug 4 15:40:04 EDT 2003
At 01:29 PM 8/4/03 -0400, Shane Hathaway wrote:
>Phillip J. Eby wrote:
>>The nice thing about a prevalence architecture is that it's dramatically
>>simpler than a persistence architecture, provided you 1) have the memory
>>and 2) don't mind architecting around command classes, and 3) don't mind
>>taking forever for an application to restart. I have some thoughts about
>>how to make #2 relatively transparent, but #1 and #3 are generally more
>>of an issue.
>
>Very interesting.
>
>I have some ideas for combining persistence and prevalence, then. (I
>haven't heard that word used in such a way before, but I guess it's
>standard.) We could use persistence to load objects and prevalence to
>store objects.
That last sentence makes my head hurt, but I think it's just a terminology
misunderstanding. In a prevalence system, you don't "load and store
objects", because all the objects are in memory, all the time. Instead,
you simply "log and checkpoint" objects so that you can restore your system
if the hardware crashes or you need to upgrade the software.
So, technically speaking, the idea you describe below has nothing to do
with prevalence; it's just improving a persistence system by using the
command pattern. Not that there's anything wrong with that, per se; when
Ty and I first did our research on whether ZODB was suitable for our
applications, we suggested more or less the same thing to Jim. But, this
doesn't actually fit too well within the current ZODB architecture. See below.
> That way the memory consumption and startup time stay down while we get
> the potential for high writes.
>
>Here is how it might work. ZEO clients, when possible, might send
>mutation commands instead of pickles to the ZEO server. The ZEO server
>would execute the mutation commands on the current object system rather
>than the object system that the client saw. The server would commit the
>changes and all clients would be notified of the changes made by the ZEO
>server.
>
>Under this system, it seems like conflicts could only occur in the
>presence of non-"prevalent" changes, or if a command raises a
>ConflictError. That should improve write volume.
Certainly, logging only modifications would allow BTree pages to be bigger,
while reducing write transaction size. Note, however, that it's not very
compatible with ZEO, as I understand it. ZEO deals with raw uninterpreted
pickles, not objects. So, when loading an object, it would be necessary to
"replay" the diffs since the last snapshot. This is a pretty significant
change to everything from protocols, to the objects that need to understand
how to make and apply diffs.
The reason this approach works so well in prevalence is that command
objects are high-level business objects - think of them as individual web
REQUEST instances, pared down to just the data needed to execute that
transaction. The actual changes to the database (including indexes) are
likely to be much greater than the amount of data supplied in the
request. So, the more you drill this down to individual object
modifications, the less benefit you receive compared to logging the
top-level command. But, the transformation of the command into the
modifications is a function of the objects' code, so this can't be shoved
into a "dumb" storage back-end. RDBMSes also tend to have smaller logs,
because their data structures are simpler, more regular, and carry less
internal metadata.
More information about the ZODB-Dev
mailing list