[ZODB-Dev] Use of fsync in FileStorage

Tim Peters tim at zope.com
Tue Jul 27 18:57:21 EDT 2004


[Dieter Maurer]
> If "fsync" does what people think it should do, then a single "fsync" is
> enough -- after the transaction has been written and before the "status"
> byte is written.
>
> When I remember correctly the original post, then FileStorage has
> precisely this "fsync".

Na, the current code is like this for a successful transaction:

1a   append a txn header with a "no good" status byte
1b   bulk-append the txn body from the temp file it lives in
1c   append a txn trailer
2    flush
3    seek back and overwrite the status byte with an "OK" value
4    flush
5    fsync
6    sometimes write a new .index file

The original worry was that fsync would stick bytes on disk "left to right",
so that a power outage *during* fsync might leave the status byte saying
"OK" yet leave some number of bytes after that in a gibberish state.  I
imagine a system crash during step 4 could do the same thing, and it is
somewhat worrying.

Your suggestion is to make step 5 step number 2.5 instead, and possibly drop
step 4.  I'll guess the objection to that will be that *then* when the power
goes out, it's possible to lose a perfectly good transaction, one that
completed long before the power outage (for some unhelpfully OS- and
load-dependent meaning of "long before"), and that the window of
vulnerability for this is much longer.  For example, Windows seems happy to
keep data in memory buffers indefinitely, so if another transaction won't
commit for an hour, and we lose power after 59 minutes, and the backup power
supply fails, that status byte on disk still says "no good" and we get a
surprisingly out-of-date Data.fs after the reboot.  Since a transaction can
affect multiple storages on multiple machines, it would bad too to leave
them in a mutually inconsistent state.  I don't think we can stop that in
all cases (steps 3-6 are in tpc_finish, and a crash then is too late to stop
the transaction on other boxes), but giving it a *long* window to fail in
would be bad.

Toby's suggestion is to add an additional fsync as step 2.5.



More information about the ZODB-Dev mailing list