[ZODB-Dev] Re[2]: ZEO Replicated Storage
Eugene
el-spam at yandex.ru
Fri Jul 2 06:36:44 EDT 2004
Hello Jim,
Friday, July 2, 2004, 12:01:20 AM, you wrote:
JF> Note that I'm following up to zodb-dev
OK.
JF> Eugene wrote:
>> Hello Rob,
>>
>> RP> Andreas' idea will work but it doesn't create a 'zero downtime'
>> RP> environment as the copy and the rsync take some amount of time during
>> RP> which other transactions can be applied to the ZODB.
>> May be I don't understand you well, but I see here's a problem with a
>> lot of extra work for admin in detecting problems in DB.
>> I have this situation, one database has crashed several times.
>> Each time it was restored from backup, and some time later it crashes
>> again. Of cource, this db was tested in all ways using all FS
>> utilities I've found in the Internet and all recipes from zopelabs, or
>> some people. And none of the utilitis found the trouble in DB, all is
>> ok, except in few days db crashes again.
JF> If your data is getting corrupted and utilities like
JF> fstest don't catch the problem, we'd like to get a copy of your
JF> database so we can fix whatever is wrong with fstest that is causing the
JF> problem to go undetected,
It's not a problem.
I can give it to you, just say which files do you wish to
see:
Data.fs - 52MB
Data.fs.old - 47MB
Data.fs.index - 100KB
I tried to restore this DB with diffent utilities, these stages and
short recovery log saved on my computer. I also can give it to you.
JF> ZRS deals with things like server's going down. It doesn't
JF> directly deal with corruption because, frankly, that hasn't
JF> been a problem for us or our customers. :)
JF> It does deal indirectly with corruption because, AFAIK, corruption
JF> is generally caused by hardware or system failures and ZRS lets you use
JF> multiple systems.
We looked on our system - there's neither system faults nor disk troubles.
At the same time there worked Apache, Mysql, Cyrus etc.... none of
them suffered.
JF> A significant difference between ZRS and rsync is that it replicates
JF> at the transaction level, not at the file level.
JF> If your file gets
JF> corrupted, rsync, or any other backup mechanism will happily duplicate
JF> the corruption. ZRS, on the other hand, independently applies transactions
JF> to each replicated storage.
It's very good.
I guess files easily get broken, but if transactions are checked and
copied, then we can prevent expansion of error to other db and locate
one with error for further recovery.
>> So I cannot be sure my server works well, and there's no potential
>> problems in DB. Persistent online monitoring is not the case,
>> especially I cannot catch the moment when error gets to my DB. How to
>> find is there error backup from yesterday or before yesterday?
>>
>> RP> Zope Replication Services (ZRS) minimizes cluster downtime while
>> RP> maximizing the transactional integrity of the ZODB. Downtime is
>> RP> limited to the time necessary to detect failure and transition to the
>> RP> secondary ZRS storage server. Transactions are saved on the primary
>> RP> storage server *and* sent into the ZRS cloud for storage on some number
>> RP> of secondary servers.
>> I'm looking for utility, which can detect error automatically.
>> If some write operation made error, I want to find it when error
>> appear, but not when my bd got inoperable.
JF> We are pretty sure that write operations don't cause the corruption.
May be I didn't understand you well, but
I can't see another source of errors except write operations.
If nothing changes it's impossible to get error.
JF> The only way to detect the corruption is to periodically re-read
JF> the data. If you are having frequent corruption problems, I suspect
JF> you have a system problem.
Disk was checked - no errors found,
and all other software works fine at these days.
JF> It doesn't detect problems. Rather, it provides a warm backup if problems
JF> occur. Importantly for you, if corruption occurs in one database,
JF> it's extremely unlikely that replicated databases will be corrupted,
JF> so recovery is very fast.
Something like raid?
It's very well if db is crashed and site works with other ones
while I'm reparing the corrupted db. So there's a chance to maintain
24x7 system.
Where can I get more info about ZRS?
I saw a page on it in your site, but I didn't give me all info.
--
Best regards,
Eugene mailto:el-spam at yandex.ru
More information about the ZODB-Dev
mailing list