[Zope] replication (was Zope: 5.4, jboss 0.3 million hits with google)

Toby Dickenson tdickenson@geminidataloggers.com
Thu, 13 Feb 2003 11:18:42 +0000


On Thursday 13 February 2003 1:53 am, Paul Winkler wrote:

> > how can we replicate with DirStorage?  would you mind doing a little
> > brain dump?  with rsync?

rsync has to stat every inode in the directory. That sucks.

Version 1.0 has a whatsnew.py script that uses the normal undo log information 
to work out what files have changed since a historic transaction id. This is 
used by the incremental backup tool in version 1.0 and the replication script 
in 1.1, and makes them maximally efficient in I/O terms.

(just remember to keep enough history when packing to cover your 
backup/replication interval)

> I am not really the person to ask, as I haven't actually done it.

But you know you want to ;-)

> But here's the official way to do it as of version 1.1:
> http://dirstorage.sourceforge.net/replica.html

That document has been updated in the last week, it now has a more detailed 
howto. Essentially,  on the replica machine run:
"replica.py masterzeohost:/var/master /var/replica"
and it should "just work"

> I would naively assume that you *should* be able to replicate by
> 1) putting the "master" into snapshot mode
> 2) running rsync
> 3) taking the "master" out of snapshot mode
>
> ... but there may be hidden issues with that; I would kind of
> assume so, since Toby Dickenson bothered to write the replication
> tool.  Toby, are you reading this? Care to comment?

That will kinda work, apart from the performance issues mentioned above. Take 
care over locking on the replica; you dont want replication to restart when 
the master comes back up after an outage, with the storage still running on 
the slave.

The big problem with this is that rsync is not atomic. If the master explodes 
half way through an rsync then the replica may contain half of the most 
recent transaction.

1.1 might still be in alpha, but I am sure it is more stable than anything 
based on rsync. As always, I am already using it in production. Replicating 
once per minute and performing a full check on the replica storage once per 
hour. It is looking good so far.

-- 
Toby Dickenson
http://www.geminidataloggers.com/people/tdickenson