ZODB/ZSS High Availablity, was: RE: [Zope] Zope Myths?
I have been doing a lot of thinking about odb/storage/zss replication lately, but I haven't had a chance to implement these practices yet, so your mileage, insights, and opinions may vary from these thoughts... If the thing that makes replication hard is constant change of lots of interdependant data, a meaningful snapshot system as close to the database software as possible (i.e. DirectoryStorage's snapshots, not LVM's) likely mitigates that risk by providing reasonable assurance of atomicity. If the replication process itself has problems part way through transfer (a low tech soutions like find+cpio over nfs would), it is up to the sysadmin to write scripts to: 1 - Keep multiple areas for replication -> Stage the entire replication in a temp dir before putting it in the place that it is used by ZSS software -> since there is no way to do a transactional file copy of multiple files, how about using symlinks, and moving the symlink on completion of a full, atomic transfer and completed storage consistency check? 2 - Have clustering software resource takeover scripts (i.e. heartbeat resource scripts) evaluate: a. if the storage it is about to use is good, & b. if the last transfer failed, use the last _good_ full replicated set of files. c. The above two checks must be done before starting the ZSS process on the backup server node. Mostly, I can't see how shared storage (DAS/SAN) can provide the same risk-avoidance levels that could be done with the above practices, unless you have some ways of mirroring the last good copy of your odb storage within the same shared storage (replication between two places on the same storage; I assume snapshots and scrips on the secondary node to check consistency of storage/db like 2(a) above could come in handy for this too)? Sean -----Original Message----- From: tomas@fabula.de [mailto:tomas@fabula.de] Sent: Thursday, September 12, 2002 1:55 PM To: Bill Anderson Cc: sean.upton@uniontrib.com; pw_lists@slinkp.com; zope@zope.org Subject: Re: [Zope] Zope Myths? On Thu, Sep 12, 2002 at 11:12:27AM -0600, Bill Anderson wrote:
On Thu, 2002-09-12 at 00:21, tomas@fabula.de wrote:
On Wed, Sep 11, 2002 at 03:46:43PM -0700, sean.upton@uniontrib.com wrote:
[Big hardware vs. replication]
Well, in that case, your network is a single point of failure, too. :^)
Assuming just one network, assuming just one connectivity provider. Problem is that replication solutions depend heavyly on the type of application (slowly changing sets of files being the easiest and rapidly changing data sets with complex interdependencies (e.g. high-volume databases) the hardest.
Expensive, well that depend son what you are doing. For under 35000 you can have just shy of a 1TB of file space, with snapshot capability, multi-machine fail over, and a whole lot more. That cost includes two machines running Linux with fail over. It depends on your needs, and your uptime/availability requirements.
...or you can have ten cheapo off the shelf servers hosted at wildly different places... (OK, it's more like 0.5TB then ;)
For example, if you are running a site like cbsnewyork.com, 25-35 grand is not that much. If you are running a small site, then you don't need it. My point was that it (it being ZEO/ZODB/ZOPE) _can_ scale to that. I've done it.
[...]
Well, speaking as a former tester of SAN technology, it would appear things have changed dramatically since your experiences. :)
Yes, but this was mainly my point: if you have access to knowledge and experience with those things -- then you may go for it. If you don't... it's just a point against it.
The configuration/setup is essentially the same as with SCSI, in fact, Fiber channel uses the SCSI subsystem in the OS. The underlying system is as robust as the SCSI system, since it is SCSI just over a different medium.
[RAID system doing funny things]
I've never seen this with the Fiber Channel Arrays I dealt with. But then again, they had two or more controllers. :^)
Of course not -- but I take that you *know* what you are doing. This vendor didn't (at some point I realized that), but heck, it wasn't my job, I had enough on the plate myself. [...]
Same thing with fibre channel SAN tech, the range is measured in miles. I know of several SANs that are spread over multiple states. You can literally have a fail over datacenter.
Yes. It's a tradeoff. I just wanted to point out that experience with those things is one of the points to consider (besides application type, requirements and cost). It'd be interesting to know (I'm not a Zope guy) how well Zope as an application would play in each camp. Thanks -- tomas
On Thu, Sep 12, 2002 at 02:21:28PM -0700, sean.upton@uniontrib.com wrote:
I have been doing a lot of thinking about odb/storage/zss replication lately, but I haven't had a chance to implement these practices yet, so your mileage, insights, and opinions may vary from these thoughts...
If the thing that makes replication hard is constant change of lots of interdependant data, a meaningful snapshot system as close to the database software as possible (i.e. DirectoryStorage's snapshots, not LVM's) likely mitigates that risk by providing reasonable assurance of atomicity.
Yep. The replication system has to know what a transaction is. You might be able to live with the loss of a (couple of) transactions, but not with the loss of half a transaction.
If the replication process itself has problems part way through transfer (a low tech soutions like find+cpio over nfs would),
Rsync. I keep saying rsync is your friend :-)
it is up to the sysadmin to write scripts to: 1 - Keep multiple areas for replication -> Stage the entire replication in a temp dir before putting it in the place that it is used by ZSS software -> since there is no way to do a transactional file copy of multiple files, how about using symlinks, and moving the symlink on completion of a full, atomic transfer and completed storage consistency check?
Hmmm. The whole problem seems to be to get a copy of your set with no (or with bearable) data `skew'. But then you must know the innards of your database (or maybe have a sort of `freeze point' in time akin to a `meta transaction' checkpoint.
2 - Have clustering software resource takeover scripts (i.e. heartbeat resource scripts) evaluate: a. if the storage it is about to use is good, & b. if the last transfer failed, use the last _good_ full replicated set of files. c. The above two checks must be done before starting the ZSS process on the backup server node.
Sounds quite difficult without having access to the innards of the DB (I am using the word DB loosely here, more as `data set with some consistency restrictions', that may be a bunch of files or whatever).
Mostly, I can't see how shared storage (DAS/SAN) can provide the same risk-avoidance levels that could be done with the above practices, unless you have some ways of mirroring the last good copy of your odb storage within the same shared storage (replication between two places on the same storage; I assume snapshots and scrips on the secondary node to check consistency of storage/db like 2(a) above could come in handy for this too)?
It boils down to: know thy application -- doesn't it? Back to Zope -- does anyone know how the prospects for the ZODB are? Thanks -- tomas
On a related note... thinking of using coda (http://www.coda.cs.cmu.edu/) to address the single point of failure for our zeo storage. But I've never used coda. Anybody done it with zope? Issues? --PW -- Paul Winkler "Welcome to Muppet Labs, where the future is made - today!"
participants (3)
-
Paul Winkler -
sean.upton@uniontrib.com -
tomas@fabula.de