[Zope] Backing up Data.fs
Toby Dickenson
tdickenson@geminidataloggers.com
Mon, 2 Jun 2003 14:07:55 +0100
On Monday 02 June 2003 12:48, Peter Sabaini wrote:
> Toby Dickenson wrote:
> > On Monday 02 June 2003 10:18, Peter Sabaini wrote:
> > Think
> > of backup as a database operation - this script has no ACID.
>
> please explain -- why would i need transactions and concurrency with a
> backup script?
Suppose this morning (Monday) the machine locks up (or power failure, or
network loss if this is on a network mount, etc) immediately after this
backup script has terminated. Some, but not all of the changes to
/archive/monday/data.fs will have been written to disk, and some will have
been lost. There is every chance that todays backup has been mashed together
with the backup from last weeks monday in some arbitrarily confusing way.
There is no new backup, and the old backup has been destroyed.
Sure, if this happens only once then this script leaves you with six other
good daily copies. But during a disaster recovery is the worst possible time
to have to be thinking about which "backup" directory contains a data.fs that
has recently been destroyed by a non-transactional backup script. Sean Upton
talks about disaster-prepardness here:
http://zope.nipltd.com/public/lists/zope-archive.nsf/AGByKey/6036CE8DFC2D3484
This "bug" can be fixed for FileStorage backup scripts, check out the link I
posted earlier. The cost is in performance.
> > for local backups (like this script)
> > rsync
> > cant beat "cp".
>
> not that i have any numbers to back this up but i think that just
> copying the difference of two similar Data.fs's can be advantageous with
> large files -- of course this will really only make a difference if the
> backup is to a network mount or something like that.
In the form used in that script, rsync has to read the whole of both the
source file and orgininal version of the destination file, then write the
changes. There is no way it can beat cp on network traffic when the
destination is a network mount. It only has a chance of winning in I/O terms
too if your archive is on something like raid5, where writes are very much
more expensive than reads
rsync over ssh is different.
> > I strongly recommend DirectoryStorage if you want to keep daily backups
> > like this. It has a standard tool for creating local incremental backups
> > as tar files, or synchronising a remote replica of the storage. These
> > tools are ACID, and efficient. Unlike all the home-brew FileStorage
> > schemes I have seen.
>
> DirectoryStorage sounds great if you have the requirements, but for
> simpler sites (or sites which host their data in an RDBMS anyway)
> FileStorage is just ok.
Sure. FileStorage is ideal for sites that either:
1. can afford the IO and storage costs of robust backup scripts like the one
I posted a link to earlier.
2. can stick with manually scheduled backups, not automated backups.
--
Toby Dickenson
http://www.geminidataloggers.com/people/tdickenson