[Zope] Backing up Data.fs

Toby Dickenson tdickenson@geminidataloggers.com
Mon, 2 Jun 2003 14:07:55 +0100


On Monday 02 June 2003 12:48, Peter Sabaini wrote:
> Toby Dickenson wrote:
> > On Monday 02 June 2003 10:18, Peter Sabaini wrote:

> > Think
> > of backup as a database operation  -  this script has no ACID.
>
> please explain -- why would i need transactions and concurrency with a
> backup script?

Suppose this morning (Monday) the machine locks up (or power failure, or 
network loss if this is on a network mount, etc) immediately after this 
backup script has terminated. Some, but not all of the changes to 
/archive/monday/data.fs will have been written to disk, and some will have 
been lost. There is every chance that todays backup has been mashed together 
with the backup from last weeks monday in some arbitrarily confusing way. 
There is no new backup, and the old backup has been destroyed.

Sure, if this happens only once then this script leaves you with six other 
good daily copies. But during a disaster recovery is the worst possible time 
to have to be thinking about which "backup" directory contains a data.fs that 
has recently been destroyed by a non-transactional backup script. Sean Upton 
talks about disaster-prepardness here:
http://zope.nipltd.com/public/lists/zope-archive.nsf/AGByKey/6036CE8DFC2D3484

This "bug" can be fixed for FileStorage backup scripts, check out the link I 
posted earlier. The cost is in performance.

> > for local backups (like this script)
> > rsync
> > cant beat "cp".
>
> not that i have any numbers to back this up but i think that just
> copying the difference of two similar Data.fs's can be advantageous with
> large files -- of course this will really only make a difference if the
> backup is to a network mount or something like that.

In the form used in that script, rsync has to read the whole of both the 
source file and orgininal version of the destination file, then write the 
changes. There is no way it can beat cp on network traffic when the 
destination is a network mount. It only has a chance of winning in I/O terms 
too if your archive is on something like raid5, where writes are very much 
more expensive than reads

rsync over ssh is different.

> > I strongly recommend DirectoryStorage if you want to keep daily backups
> > like this. It has a standard tool for creating local incremental backups
> > as tar files, or synchronising a remote replica of the storage. These 
> > tools are ACID, and efficient. Unlike all the home-brew FileStorage
> > schemes I have seen.
>
> DirectoryStorage sounds great if you have the requirements, but for
> simpler sites (or sites which host their data in an RDBMS anyway)
> FileStorage is just ok.

Sure. FileStorage is ideal for sites that either:
1. can afford the IO and storage costs of robust backup scripts like the one
   I posted a link to earlier.
2. can stick with manually scheduled backups, not automated backups.


-- 
Toby Dickenson
http://www.geminidataloggers.com/people/tdickenson