[ZODB-Dev] Use of fsync in FileStorage

Wed Jul 28 22:06:12 EDT 2004

On Wed, 28 Jul 2004 15:53:14 -0400, Tim Peters <tim at zope.com> wrote:
> It occurs to me that the current FileStorage dance is wrong wrt what it
> *intended* to do:  while it does flush() in tpc_vote, it delays doing
> fsync() until tpc_finish.  But even without system crashes or power outages,
> fsync() can fail, and the (strong) intent is that tpc_finish never fail
> (it's fine to fail in tpc_vote -- that's why there *is* a vote phase, so a
> storage that can't complete a transaction has a chance to say so, and then
> the transaction gets aborted cleanly across all participants).
> 
> So that means that *at least* the current fsync needs to move into tpc_vote.
> Then it does what Dieter suggested, and part of what Toby suggests:  does an
> fsync() while the status flag still says "no good".

I had a rationalization for the current implementation.  I assumed the
fsync() would not fail or that if it did it was so catastrophic that
it was okay for tpc_finish() to fail, too.  At that time, a failure in
tpc_finish() would cause the whole ZODB to stop accepting transactions
(hosed).  I assumed that fsync() failures probably meant serious disk
failures.

Given that fsync() failures are very rare and fsync() is expensive, I
wanted to avoid an fsync() call in tpc_vote() in a ZEO server.  In
that case, the server calls flush, gets the data out of application
buffers, and sends its response to the ZEO server.  The hope was that
much of the data would already be written to disk by the time the
client returned with a tpc_finish() call so that fsync() would go more
quickly.  I never measured any of this so I don't know how naive it
was.  It still seems that calling fsync() in the middle of the ZEO
transaction is unfortunately slow.

Jeremy