Copying a FileStorage while transactions are being appended to the end of it potentially means that the copy completes with before the transaction commit is totally flushed to the file, so there is always the possibility that your backup is going to need to have half-written transactions manually truncated to be used on either a replica or a new Zope/ZSS instance; this isn't that big of a deal, but it is possible to avoid this manual work with dirstorage. The only things that make DirectoryStorage better in this regard is that the backup tools integrate with the storage instead of acting uninformed below it, trigger snapshot mode, and get a list of files to backup from the storage software itself (this is quicker and a better guarantee than, say, using unix 'find' and mtime on a dirstorage directory to do the same thing). Compared to FileStorage, you do not have the problem of backing up files being written to because: (a) Snapshot mode prevents changes to an object in HOME/A from being written to, buffering any writes to those files in HOME/journal and HOME/B for later flush once snapshot mode is exited (post-backup). (b) Additional transactions and objects are not added to the directory being backed up. DirectoryStorage also is preferable in these backup scenarios: 1. Disaster-preparedness. You want to backup a big storage over a WAN connection - and this means incremental. You need incremental backup and IIRC something like rsync may not work very well on a changing FileStorage Data.fs. http://mail.zope.org/pipermail/zodb-dev/2002-November/003807.html We run servers at a co-location facility, and need remote backup to our facility over a 1.5Mb/s connection, and a reasonable way to do this is use the backup.py tool to create full and incremental files locally that are pulled down to remote locations via FTP on a cron job, or even better, just run the replica.py tool from our secondary location to incrementally pull down the changes (equiv. to backup.py incremental backup, but for replica purposes) over SSH connection to our other location and to tape for standard offsite backup rotations. With FileStorage, we would have to use rsync because of bandwidth constraints, and our ability to respond quickly would be impeded by the fact that we may have to manually repair the remote copy of the filestorage via truncation of half-committed transactions. 2. ZSS High-availability clustering and replication. We have an HA cluster currently using Linux-HA heartbeat, and our crude way of copying the Data.fs is via FTP for daily snapshots in the middle of the night between our primary and secondary node. This works okay (not as well as rsync would) because this application only updates most content once-daily. However, if you have a heavier-write situation, FileStorage will not be amicable to a hot-backup clustering arrangement, because cluster software will not be able to start the ZEO storage server on the backup/secondary node in the possible case of a corrupted (even slightly) filestorage copy (someone correct me if I am wrong here). The DirectoryStorage replica.py tool addresses this by providing a secure network-enabled incremental replication mechanism that ignores incoming writes (via snapshot) to guarantee consistency and isolation (in a transactional sense) for the backup operation: the backup is consistent with the state of the storage at the point in time the snapshot mode was entered (when backup started), and incoming transactions do not effect the operation of a backup because they are isolate in HOME/journal and HOME/B while stuff is copied out of HOME/A. Given this, I feel much more comfortable that I can keep a 'hot' replica on a 'hot' backup node that is ready to take over as ZSS in the case of a failure on the primary or (mainly) the need for maintenance on the primary - and I can feel comfortable that my backup/replica reflects a recent consistent record of current heavy activity. Sean -----Original Message----- From: Chris Withers [mailto:chrisw@nipltd.com] Sent: Friday, March 28, 2003 1:05 PM To: sean.upton@uniontrib.com Cc: jccooper@jcameroncooper.com; zope@zope.org Subject: Re: [Zope] Zope backup sean.upton@uniontrib.com wrote:
Though your copy may end up needing repair after the fact; backup in this sense is not transactional. DirectoryStorage has the best answer for this at the moment (better than FileStorage),
What lead you to this belief? cheers, Chris