ZODB/ZSS High Availablity, was: RE: [Zope] Zope Myths?

13 Sep 2002 21:10:07 -0700

On Fri, 2002-09-13 at 08:30, sean.upton@uniontrib.com wrote:
> I can't remember who all is doing it, but I think network-replicated block
> devices are pretty popular for this kind of thing; I distinctly remember
> someone on a mailing list mentioning they were doing this with Zope and/or
> ZEO (might be worth a search of archives, perhaps on ZODB-dev?).  I think
> that in some circles, DRBD is preferred to NBD, but I don't claim to be an
> expert on that (check out links on the Linux-HA.org site, perhaps).
> 
> The downside to doing this is that you can't keep two distinct copies of the
> storage (the last good known copy, and the last replicated copy), since you
> are replicating at a much lower level.  In this sense, you will need to rely
> on consistency checks alone to get a partially messed-up storage working on
> the failure of the primary node, and startup of the secondary node resources
> and IP takeover.
> 

Well, you can if you periodically rsync to another spindle locally :-)
Seriously, I think you're talking apples and oranges of fault-tolerance
design. An NBD tool (or for that matter a NAS mount if it weren't for
locking Data.fs) is a proper solution if you want to spread load across
many boxes which are all simultaneously capable of doing the same
things. Classically this is the model used for a web farm. As you point
out, it's really only a good idea if the data is relatively static
because consistency checks, ACID-compliance, and access locking issues
get ugly when the data is being modified all the time. Where that's the
case, you're much more likely to see an active/passive pair that may use
the same storage for speedy recovery, but utilize log-shipping and data
check-pointing snap-shots to facilitate easy rollbacks. That's why
enterprise solutions don't put the database, application server, and web
server in the same box :-)

On a side note, I can't help but wonder what will happen to performance
when one write of a RAID-1 goes to a local UW/SCSI-3 disk and the other
one goes traipsing across three thousand miles of network...

Jack
<snip> 
-- 
Jack Coates
Monkeynoodle: A Scientific Venture...