[ZODB-Dev] Use of fsync in FileStorage
Tim Peters
tim at zope.com
Wed Jul 28 15:53:14 EDT 2004
It occurs to me that the current FileStorage dance is wrong wrt what it
*intended* to do: while it does flush() in tpc_vote, it delays doing
fsync() until tpc_finish. But even without system crashes or power outages,
fsync() can fail, and the (strong) intent is that tpc_finish never fail
(it's fine to fail in tpc_vote -- that's why there *is* a vote phase, so a
storage that can't complete a transaction has a chance to say so, and then
the transaction gets aborted cleanly across all participants).
So that means that *at least* the current fsync needs to move into tpc_vote.
Then it does what Dieter suggested, and part of what Toby suggests: does an
fsync() while the status flag still says "no good".
The question remaining is whether tpc_finish should also do an fsync, after
overwriting the status byte with a "OK!" value. I'm inclined to think yes,
for reasons explained earlier. I'll run some timings first. A possible
saving grace is that this second fsync() is only trying to change one byte,
and at the end of the file which was fsync'ed very soon before in tpc_vote.
So that's probably faster than the general fsync case on many/most systems.
Also, since this part of the disk will have already been fsync'ed
successfully in the very recent past, it's hard to think of a
non-catastrophic reason for the second fsync() call failing.
More information about the ZODB-Dev
mailing list