[Zope] Consistent Data.fs corruption in 2.6.1
Robin Breathe
rbreathe@brookes.ac.uk
Fri, 09 May 2003 18:01:42 +0100
I believe we have found a bug in the 2.6.1 release (possibly more
specifically in ZODB 3.1.1).
-----------
# Summary:
-----------
When an undo is the last transaction preceeding a "pack(days:float=0)",
the FileStorage is corrupted.
------------
# To repeat:
------------
Start with a 'consistent' FileStorage, e.g. Data.fs that passes
fstest.py without error:
1368329: transaction tid 0x034cd1c220785077 #32
1368792: transaction tid 0x034cd1dbeb647922 #33
no errors detected
Change an object (e.g. its contents, rename or delete it. whatever).
Immediately undo that change ("oops").
Pack the Database (days:float=0).
Your FileStorage should be corrupt:
1368329: transaction tid 0x034cd1c220785077 #32
1368792: transaction tid 0x034cd1e0caa0286e #33
1369330 object serialno 0x034cd1e0caa0286e does not
matchtransaction id 0x034cd1e11516ba55
Now try changing an object (perhaps the same one again).
In the event.log, you should have something like the following:
2003-05-09T17:18:56 INFO(0) ZODB conflict error at
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects
(1 conflicts since startup at 2003-05-09T17:16:34)
------
2003-05-09T17:18:57 INFO(0) ZODB conflict error at
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects
(2 conflicts since startup at 2003-05-09T17:16:34)
------
2003-05-09T17:18:58 INFO(0) ZODB conflict error at
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects
(3 conflicts since startup at 2003-05-09T17:16:34)
------
2003-05-09T17:19:01 INFO(0) ZODB conflict error at
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects
(4 conflicts since startup at 2003-05-09T17:16:34)
Without ZEO (2.0.1b1) this is only repeated once, and the operation
seemingly succeeds (on top of the corrupted FileStorage), but with ZEO
all transactions involving a write to the FileStorage fail, throwing the
following error:
ZODB.POSException.ConflictError
Traceback (innermost last):
* Module ZPublisher.Publish, line 150, in publish_module
* Module ZPublisher.Publish, line 127, in publish
* Module ZPublisher.Publish, line 127, in publish
* Module ZPublisher.Publish, line 127, in publish
* Module ZPublisher.Publish, line 122, in publish
* Module Zope.App.startup, line 142, in zpublisher_exception_hook
* Module ZPublisher.Publish, line 102, in publish
* Module Zope.App.startup, line 200, in commit
* Module ZODB.Transaction, line 246, in commit
* Module ZODB.Connection, line 680, in tpc_vote
* Module ZEO.ClientStorage, line 722, in tpc_vote
* Module ZEO.ClientStorage, line 706, in _check_serials
ConflictError: database conflict error (oid 0000000000000af6, serial was
034cd1e0caa0286e, now 034cd1e11516ba55)
The site is useless until both ZEO and the Client Instance have been
restarted, at which point it is possible to change the errant object,
and pack again (assuming no undo), which then fixes the consistency.
However restarting your site whenever someone chooses to undo an
operation just before your automated pack.
-------------
# Workaround:
-------------
Don't let an undo be the last transaction preceeding a pack (easier said
than cleanly done with automation).
--------
# Notes:
--------
Machine:
SunOS lr1 5.9 Generic_112233-04 sun4u sparc SUNW,Ultra-Enterprise
Zope:
Primarily we tried the binary Solaris release of 2.6.1 and ZEO 2.0.1b1:
Zope Version: (Zope 2.6.1 (binary release, python 2.1,
solaris-2.8-sparc), python 2.1.3, sunos5)
Python Version: 2.1.3 (#4, Sep 16 2002, 15:52:14) [GCC 2.95.3
20010315 (release)]
System Platform: sunos5
SOFTWARE_HOME: /app/zope/zope_2.6.1/lib/python
ZOPE_HOME: /app/zope/zope_2.6.1
INSTANCE_HOME: /data/zope/zope_2.6.1/test/client
CLIENT_HOME: /data/zope/zope_2.6.1/test/client/var
We have also tried our own homegrown from CVS (Zope-2_6-branch) as of
2003/05/09 15:20, along with our own Python 2.1.3 and a snapshot of /ZEO
from CVS (HEAD) (in various combinations):
Zope Version: (unreleased version, python 2.1.3, sunos5)
Python Version: 2.1.3 (#1, Mar 26 2003, 16:28:24) [GCC 3.2.2]
System Platform: sunos5
SOFTWARE_HOME: /app/zope/zope_2.6.1-cvs/lib/python
ZOPE_HOME: /app/zope/zope_2.6.1-cvs
INSTANCE_HOME: /data/zope/zope_2.6.1-cvs/test/client
CLIENT_HOME: /data/zope/zope_2.6.1-cvs/test/client/var
Further we found that the fsrecover.py script packaged with 2.6.1
release is useless. It always removes EVERYTHING from the Data.fs bar
the magic number. This was our primary incentive for trying
Zope-2_6-branch since it includes jeremy's recently improved ZODB
management scripts. Note however that jeremy's recent fstest.py does
not throw an error on inconsistencies of the type mentioned above
(despite the fact they still seem to cause problems?).
Anything else I can say? Rather a longwinded message for such an easily
repeatable bug, but nevermind.
Obviously we'd be happy to help find a solution, be that by providing
corrupt Data.fs, extra details, testing patches, etc.
Hoping for a fix,
- Robin
--
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
rbreathe@brookes.ac.uk Tel: +44 1865 483323 Fax: +44 1865 483073