[ZODB-Dev] Use of fsync in FileStorage

Tim Peters tim at zope.com
Wed Jul 28 15:53:14 EDT 2004


It occurs to me that the current FileStorage dance is wrong wrt what it
*intended* to do:  while it does flush() in tpc_vote, it delays doing
fsync() until tpc_finish.  But even without system crashes or power outages,
fsync() can fail, and the (strong) intent is that tpc_finish never fail
(it's fine to fail in tpc_vote -- that's why there *is* a vote phase, so a
storage that can't complete a transaction has a chance to say so, and then
the transaction gets aborted cleanly across all participants).

So that means that *at least* the current fsync needs to move into tpc_vote.
Then it does what Dieter suggested, and part of what Toby suggests:  does an
fsync() while the status flag still says "no good".

The question remaining is whether tpc_finish should also do an fsync, after
overwriting the status byte with a "OK!" value.  I'm inclined to think yes,
for reasons explained earlier.  I'll run some timings first.  A possible
saving grace is that this second fsync() is only trying to change one byte,
and at the end of the file which was fsync'ed very soon before in tpc_vote.
So that's probably faster than the general fsync case on many/most systems.
Also, since this part of the disk will have already been fsync'ed
successfully in the very recent past, it's hard to think of a
non-catastrophic reason for the second fsync() call failing.



More information about the ZODB-Dev mailing list