[ZCM] [ZC] 1307/ 7 Resolve "fsrecover.py is broken on cvs head (or Data.fs is really corrupted)"

Tue May 4 14:07:45 EDT 2004

Issue #1307 Update (Resolve) "fsrecover.py is broken on cvs head (or Data.fs is really corrupted)"
 Status Resolved, Zope/bug medium
To followup, visit:
  http://collector.zope.org/Zope/1307

==============================================================
= Resolve - Entry #7 by tim_one on May 4, 2004 2:07 pm

 Status: Accepted => Resolved

Thx for the followup!  Marking Resolved.

The tests we have take healthy .fs files, spray 0 bytes into them at random locations, and then check that fsrecover both reports errors and removes damaged portions.  That much works.  They don't seem to check that the recovered database is sane in all (or indeed any) respects.  So I think it needs better tests -- but they're better today than they were before this report.
________________________________________
= Comment - Entry #6 by slinkp on May 4, 2004 10:56 am

Seems to work. Thanks Tim!
fsrecover.py on a new Data.fs now produces an identical copy, and reports no errors.

I assume it still actually does some recovery on corrupted databases ;-)
________________________________________
= Comment - Entry #5 by tim_one on May 3, 2004 2:48 pm

fsrecover.py was still referencing .serial, but as part of the MVCC changes on the HEAD that attribute no longer exists.  So fsrecover.py couldn't possibly work.  Changed it to reference .tid instead.

Added a new test to testRecover.py, verifying that no error msgs are produced when recovery is fed a healthy .fs.

slinkp, let me know whether this relieves the problems you were seeing.  If so, I'll close this.  I'm not going to close it now, because I never looked at fsrecover.py before, and I'm not confident that I've fixed everything that may have broken.
________________________________________
= Comment - Entry #4 by tim_one on May 3, 2004 2:06 pm

FYI, testRecover.py tries to ensure that fsrecover isn't completely hosed -- but, AFAICT, it's almost impossible for a test in testRecover.py to fail(!).  I'm adding a new test there that doesn't deliberately damage the input .fs, and that new test is generated scads of error msgs too.  However, testRecover.py suppresses all the error output, so the test thinks it passed.  This will take some doing to untangle.
________________________________________
= Assign - Entry #3 by tim_one on May 3, 2004 11:23 am

 Status: Pending => Accepted

 Supporters added: tim_one

Thanks for the report!  Assigned to me.
________________________________________
= Comment - Entry #2 by slinkp on May 3, 2004 11:11 am

whoops, meant to select Database as the topic.
________________________________________
= Request - Entry #1 by slinkp on May 3, 2004 11:10 am

fsrecover.py is broken on current zope HEAD...         
... either that, or brand-new filestorages created by zope                 
really are corrupted.                                                           

To reproduce:                                                                   
- create a fresh Zope instance                                                  
- start zope                                                                    
- stop zope                                                                     
- run fsrecover.py on the brand new Data.fs                                     

Result is about 900 lines like this:   

0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 error reading txn header: invalid transaction length, 0, at 1369028                                                          

==============================================================