[Zope] ZODB performance: reads to writes

27 Jun 2000 22:37:25 GMT

In article <000d01bfddfb$4546f070$3e48a4d8@digicool.com>,
Evan Simpson <evan@digicool.com> wrote:
> ----- Original Message -----
> From: Jimmie Houchin <jhouchin@texoma.net>
> 
> > Will an app as described above still suffer from problems with high
> writes?
> 
> Possibly, but only if there are hidden hotspots.  For example, in your
[...]> 
> 2. Implement the application-level conflict handling you read about, so that
> Folders and Catalogs can decide that two writes don't conflict after all,
> and merge them into a single update.

Unfortunately, this doesn't deal with cases where the conflicting state
is contained in many objects (see note by PJE in the ZODB Wiki).

Also, there is a whole other area of difficulty for high-write-volume
ZODBs, which is the ammount of IO that needs to be done.  First, by
nature ZODB can't rewrite a single attribute of an object, it has to
rewrite the entire thing.

Indexing is also a bear from an IO perspective.  First, BTrees currently
keep a count at each level, so every change to a btree changes a node at
each level of the BTree.  For a ZCatalog, there are a lot of btrees
(something like 2n+4 for n indexes, I think -- don't quote me on that,
it's been a while), and each one changes (last I looked, every index was
updated even if the value indexed in a particular one hadn't changed. 
This may have been improved since).  Not only is this bad from a hotspot
point of view (always a conflict on the root node of the tree), but you
end up doing a *lot* of IO.  During my experiments that led to
BerkeleyStorage, I was watching the Data.fs grow by 47K per transaction
for adding indexed objects of ~1K in size.  Watching this with
tranalyzer, this turns out to be 1K of object, and 46K of updated btree
pages :).  Note that BerkeleyStorage only prevents the file from growing
that much -- it still has to do all that IO (in fact, it has to do ~2-3
times that much IO, due to the nature of BerkeleyDB.  A relational
storage would have similar issues.  For ammount of IO done, FileStorage
is about as efficient as you can possibly be -- it's just that it trades
that off against space reclamation). 

Also, with any kind of Berkeley or Relational storage, there is a second
hidden IO and storage penalty: you're storing a btree inside a btree. In
other words, the lower-level DB uses btrees to store your objects,
including interior nodes of the higher-level ZODB btree. Every interior
node of the ZODB Btree needs a leaf node (and supporting interior nodes)
in the DB's btrees. so you get taxed twice, on both I/O and storage
space used.

Not to discourage anyone from using ZODB, necessarily.  There are a lot
of things it's fantastic for, and without a doubt ZODB is getting better
at handling higher write ratios. Over time there will be more and
more applications that previously would have required an external SQL or
other kind of database that can be done in ZODB instead.  However, there
will also IMHO always be applications that ZODB just isn't as suitable
for. You have to thing long and hard before committing to one or ther
other. And then there's the worry of what happens if you chose wrong.

We were faced with exactly these issues, and the extremes of them, to
boot.  We have a *large*, *very* high write ratio, lots of indexes type
of application based on ZPublisher/DTML that we'd like to port
to/replace with something Zope based.  Yet we might need to make another
instance of this same type of application used by only a few people with
a small ammount of data -- it would really suck to have to have to have
another instance of the same expensive database system to support a
miniscule ammount of data, because everything was coded only with SQL in
mind). 

This is what led ultimately to ZPatterns -- you can write applications
and not have to decide up front on ZODB or SQL.  And you can change your
mind later (Seen that TV commercial? suddenly your online store is
selling a zillion items per month instead of the 1000 you planned for. 
oops!).  You can even decide on an instance by instance basis.  You
configure with ZODB for a small department or client, and Oracle or
Sybase for a huge one -- and the small guy doesn't have to pay for the
DB license and DBA!). Since then, we've discovered a number of other
benefits to the model.

Hmmm... I didn't intend to write a ZPatterns advertisement when I
started, honest! But this seems to have turned into one nonetheless :^)