RE: [Zope] Zope backup
Copying a FileStorage while transactions are being appended to the end of it potentially means that the copy completes with before the transaction commit is totally flushed to the file, so there is always the possibility that your backup is going to need to have half-written transactions manually truncated to be used on either a replica or a new Zope/ZSS instance; this isn't that big of a deal, but it is possible to avoid this manual work with dirstorage. The only things that make DirectoryStorage better in this regard is that the backup tools integrate with the storage instead of acting uninformed below it, trigger snapshot mode, and get a list of files to backup from the storage software itself (this is quicker and a better guarantee than, say, using unix 'find' and mtime on a dirstorage directory to do the same thing). Compared to FileStorage, you do not have the problem of backing up files being written to because: (a) Snapshot mode prevents changes to an object in HOME/A from being written to, buffering any writes to those files in HOME/journal and HOME/B for later flush once snapshot mode is exited (post-backup). (b) Additional transactions and objects are not added to the directory being backed up. DirectoryStorage also is preferable in these backup scenarios: 1. Disaster-preparedness. You want to backup a big storage over a WAN connection - and this means incremental. You need incremental backup and IIRC something like rsync may not work very well on a changing FileStorage Data.fs. http://mail.zope.org/pipermail/zodb-dev/2002-November/003807.html We run servers at a co-location facility, and need remote backup to our facility over a 1.5Mb/s connection, and a reasonable way to do this is use the backup.py tool to create full and incremental files locally that are pulled down to remote locations via FTP on a cron job, or even better, just run the replica.py tool from our secondary location to incrementally pull down the changes (equiv. to backup.py incremental backup, but for replica purposes) over SSH connection to our other location and to tape for standard offsite backup rotations. With FileStorage, we would have to use rsync because of bandwidth constraints, and our ability to respond quickly would be impeded by the fact that we may have to manually repair the remote copy of the filestorage via truncation of half-committed transactions. 2. ZSS High-availability clustering and replication. We have an HA cluster currently using Linux-HA heartbeat, and our crude way of copying the Data.fs is via FTP for daily snapshots in the middle of the night between our primary and secondary node. This works okay (not as well as rsync would) because this application only updates most content once-daily. However, if you have a heavier-write situation, FileStorage will not be amicable to a hot-backup clustering arrangement, because cluster software will not be able to start the ZEO storage server on the backup/secondary node in the possible case of a corrupted (even slightly) filestorage copy (someone correct me if I am wrong here). The DirectoryStorage replica.py tool addresses this by providing a secure network-enabled incremental replication mechanism that ignores incoming writes (via snapshot) to guarantee consistency and isolation (in a transactional sense) for the backup operation: the backup is consistent with the state of the storage at the point in time the snapshot mode was entered (when backup started), and incoming transactions do not effect the operation of a backup because they are isolate in HOME/journal and HOME/B while stuff is copied out of HOME/A. Given this, I feel much more comfortable that I can keep a 'hot' replica on a 'hot' backup node that is ready to take over as ZSS in the case of a failure on the primary or (mainly) the need for maintenance on the primary - and I can feel comfortable that my backup/replica reflects a recent consistent record of current heavy activity. Sean -----Original Message----- From: Chris Withers [mailto:chrisw@nipltd.com] Sent: Friday, March 28, 2003 1:05 PM To: sean.upton@uniontrib.com Cc: jccooper@jcameroncooper.com; zope@zope.org Subject: Re: [Zope] Zope backup sean.upton@uniontrib.com wrote:
Though your copy may end up needing repair after the fact; backup in this sense is not transactional. DirectoryStorage has the best answer for this at the moment (better than FileStorage),
What lead you to this belief? cheers, Chris
sean.upton@uniontrib.com wrote at 2003-3-28 14:18 -0800:
Copying a FileStorage while transactions are being appended to the end of it potentially means that the copy completes with before the transaction commit is totally flushed to the file, so there is always the possibility that your backup is going to need to have half-written transactions manually truncated to be used on either a replica or a new Zope/ZSS instance; this isn't that big of a deal ....
When I understand this right, then this is done automatically when the "FileStorage" is opened. I refer to a question in the mailing list: Someone reported many files with strange extensions in his "var" folder. A ZODB developer (Jeremy?) replied they were incomplete transaction records found when the storage was opened. Dieter
On Sunday 30 March 2003 8:57 am, Dieter Maurer wrote:
sean.upton@uniontrib.com wrote at 2003-3-28 14:18 -0800:
Copying a FileStorage while transactions are being appended to the end of it potentially means that the copy completes with before the transaction commit is totally flushed to the file, so there is always the possibility that your backup is going to need to have half-written transactions manually truncated to be used on either a replica or a new Zope/ZSS instance; this isn't that big of a deal ....
When I understand this right, then this is done automatically when the "FileStorage" is opened.
I think that is correct - Ive never needed this *manual* process after a correct backup/restore cycle of FileStorage This truncating process might be automatic, however it is slow because it effectively needs to regenerate data.fs.index. It is impossible for a backup of a live FileStorage to atomically backup this file. The process of recreating the index needs to read the entire data.fs file into memory from start to end. This process would be needed in between any attempted incremental backup or incremental replication cycle. In comparison DirectoryStorage is close to maximally efficient at incremental backups and replications. DirectoryStorage's efficiency with incremental operations comes at a cost; it is slower than FileStorage to make or restore a full backup. Creating all those tiny files in a filesystem is equivalent to recreating data.fs.index, but slower. (provided data.fs.index fits in memory of course. FileStorage is terminally slow once you push it into swap.) -- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
participants (3)
-
Dieter Maurer -
sean.upton@uniontrib.com -
Toby Dickenson