[ZODB-Dev] Recovering from BTree corruption
Jim Fulton
jim at zope.com
Thu Sep 27 11:47:07 EDT 2007
On Sep 12, 2007, at 10:28 AM, Jim Fulton wrote:
...
>>>> - checkbtrees.py
>>>> - fstest.py
>>>
>>> There's an fsrefs script that checks internal references I believe.
>>
>> fsrefs.py shows loads of problems in both the data.fs and the
>> resources.fs.
>> probably > 200 entries per database. i.e.
>>
>> oid 0xD87110L BTrees._OOBTree.OOBucket
>> last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
>> refers to invalid objects:
>> oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing:
>> '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing:
>> '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing:
>> '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing:
>> '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing:
>> '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing:
>> '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: '<unknown>'
>> oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing:
>> '<unknown>'
...
>> - How do I tell if something is a reference to another database?
>
> I don't know how to do this with fsrefs. I'm not 100% sure that
> fsrefs recognizes cross-database references.
I did a little looking at fsrefs. It doesn't analyze the types of
references. It just tries to load objects. This approach, aside from
being less informative than it should be, totally fails with multiple
databases. Cross-database references will always be reported as
"missing" by fsrefs.
....
> I'll try to make some time in the next few days to look at this issue.
Man it's hard to make time ...
>
> I'll look at fsrefs a bit more closely to:
>
> - make sure it understands cross-database references, and
It doesn't.
> - Make sure it reports whether missing references are local or
> remote.
Haha ;)
> I'd like to decide what to do next based on this investigation. In
> particular, I want to be sure if the problems you are having are
> actually due to cross-database reference issues.
>
> I'll also look at writing a tool that might be able to recover lost
> objects from backup databases. The idea is that a tool would scan
> a database for missing oids save the list to files, separating
> references to different databases. Then there'd be another tool
> that would read this list and a list of old database files and scan
> the files looking for oids in the list and extracting records if
> they are found.
I spent some time on an analyses tool. See:
http://svn.zope.org/zc.fsutil/branches/dev/
and especially:
http://svn.zope.org/zc.fsutil/branches/dev/src/zc/fsutil/
references.txt?view=auto
It will help you figure out if you have holes and separate cross-
database and local references. You may have to work a little though.
The data structures produced will allow you to analyze broken cross-
database references in a way that should be fairly obvious. (Hint,
you'll have to generate data for each database and make sure that all
of oids mentioned in the set of cross-database references are
actually present in the named databases.)
A major challenge is handling large databases. We have databases
will millions of objects and I kept having to trim the amount of data
analyzed to fit the data structures in memory. It is interesting to
look at the evolution of the data structures over the last couple of
days yesterday as I tried to cope with scale.
The obvious next step is to store data in a database rather than
memory. This will slow things down, but will allow me to work with
arbitrarily large databases and keep richer data structures.
Assuming that you still care about this (you've been quiet :), I
suggest using this tool to find the holes. (You can also use it to
find the objects that refer to the missing objects.)
Then, once you've found the missing oids, you should go to backups,
open file storages on the backups and, if the oids are present, copy
the pickles to the database under repair. Something like:
pickles = [backup_storage.load(oid, '')[0] for oid in oids]
t = transaction.begin()
s = database_with_hole
s.tpc_begin(t)
[s.store(oid, '\0'*8, p, '', t) for (oid, p) in zip(oids, pickles)]
s.tpc_vote(t)
s.tpc_finish(t)
If you don't have the data in backups, then you might be able to use
information about the objects referring to the missing objects to
repair the refering objects by hand by deleting the references to
missing objects.
Hope this helps.
Jim
--
Jim Fulton
Zope Corporation
More information about the ZODB-Dev
mailing list