[ZODB-Dev] Re[2]: ZEO Replicated Storage

Fri Jul 2 06:36:44 EDT 2004

Hello Jim,

Friday, July 2, 2004, 12:01:20 AM, you wrote:

JF> Note that I'm following up to zodb-dev
OK.

JF> Eugene wrote:
>> Hello Rob,
>> 
>> RP> Andreas' idea will work but it doesn't create a 'zero downtime'
>> RP> environment as the copy and the rsync take some amount of time during
>> RP> which other transactions can be applied to the ZODB.
>> May be I don't understand you well, but I see here's a problem with a
>> lot of extra work for admin in detecting problems in DB.
 >> I have this situation, one database has crashed several times.
 >> Each time it was restored from backup, and some time later it crashes
 >> again. Of cource, this db was tested in all ways using all FS
 >> utilities I've found in the Internet and all recipes from zopelabs, or
 >> some people. And none of the utilitis found the trouble in DB, all is
 >> ok, except in few days db crashes again.

JF> If your data is getting corrupted and utilities like
JF> fstest don't catch the problem, we'd like to get a copy of your
JF> database so we can fix whatever is wrong with fstest that is causing the
JF> problem to go undetected,
It's not a problem.
I can give it to you, just say which files do you  wish to
see:
 Data.fs       - 52MB
 Data.fs.old   - 47MB
 Data.fs.index - 100KB

I tried to restore this DB with diffent utilities, these stages and
short recovery log saved on my computer. I also can give it to you.

JF> ZRS deals with things like server's going down. It doesn't
JF> directly deal with corruption because, frankly, that hasn't
JF> been a problem for us or our customers. :)

JF> It does deal indirectly with corruption because, AFAIK, corruption
JF> is generally caused by hardware or system failures and ZRS lets you use
JF> multiple systems.
We looked on our system - there's neither system faults nor disk troubles.
At the same time there worked Apache, Mysql, Cyrus etc.... none of
them suffered.

JF> A significant difference between ZRS and rsync is that it replicates
JF> at the transaction level, not at the file level.
JF> If your file gets
JF> corrupted, rsync, or any other backup mechanism will happily duplicate
JF> the corruption.  ZRS, on the other hand, independently applies transactions
JF> to each replicated storage.
It's very good.
I guess files easily get broken, but if transactions are checked and
copied, then we can prevent expansion of error to other db and locate
one with error for further recovery.

>> So I cannot be sure my server works well, and there's no potential
>> problems in DB. Persistent online monitoring is not the case,
>> especially I cannot catch the moment when error gets to my DB. How to
>> find is there error backup from yesterday or before yesterday?
>> 
>> RP> Zope Replication Services (ZRS) minimizes cluster downtime while
>> RP> maximizing the transactional integrity of the ZODB.  Downtime is
>> RP> limited to the time necessary to detect failure and transition to the
>> RP> secondary ZRS storage server.  Transactions are saved on the primary
>> RP> storage server *and* sent into the ZRS cloud for storage on some number
>> RP> of secondary servers.
>> I'm looking for utility, which can detect error automatically.
>> If some write operation made error, I want to find it when error
>> appear, but not when my bd got inoperable.

JF> We are pretty sure that write operations don't cause the corruption.
May be I didn't understand you well, but
I can't see another source of errors except write operations.
If nothing changes it's impossible to get error.

JF> The only way to detect the corruption is to periodically re-read
JF> the data.  If you are having frequent corruption problems, I suspect
JF> you have a system problem.
Disk was checked - no errors found,
and all other software works fine at these days.

JF> It doesn't detect problems.  Rather, it provides a warm backup if problems
JF> occur. Importantly for you, if corruption occurs in one database,
JF> it's extremely unlikely that replicated databases will be corrupted,
JF> so recovery is very fast.
Something like raid?
It's very well if db is crashed and site works with other ones
while I'm reparing the corrupted db. So there's a chance to maintain
24x7 system.
Where can I get more info about ZRS?
I saw a page on it in your site, but I didn't give me all info.

-- 
Best regards,
 Eugene                            mailto:el-spam at yandex.ru