[Zope] Consistent Data.fs corruption in 2.6.1

Robin Breathe rbreathe@brookes.ac.uk
Fri, 09 May 2003 18:01:42 +0100


I believe we have found a bug in the 2.6.1 release (possibly more 
specifically in ZODB 3.1.1).

-----------
# Summary:
-----------

When an undo is the last transaction preceeding a "pack(days:float=0)", 
the FileStorage is corrupted.

------------
# To repeat:
------------

Start with a 'consistent' FileStorage, e.g. Data.fs that passes 
fstest.py without error:

     1368329: transaction tid 0x034cd1c220785077 #32
     1368792: transaction tid 0x034cd1dbeb647922 #33
     no errors detected

Change an object (e.g. its contents, rename or delete it.  whatever).
Immediately undo that change ("oops").
Pack the Database (days:float=0).
Your FileStorage should be corrupt:

     1368329: transaction tid 0x034cd1c220785077 #32
     1368792: transaction tid 0x034cd1e0caa0286e #33
     1369330 object serialno 0x034cd1e0caa0286e does not 
matchtransaction id 0x034cd1e11516ba55

Now try changing an object (perhaps the same one again).
In the event.log, you should have something like the following:

2003-05-09T17:18:56 INFO(0) ZODB conflict error at 
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects 
(1 conflicts since startup at 2003-05-09T17:16:34)
------
2003-05-09T17:18:57 INFO(0) ZODB conflict error at 
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects 
(2 conflicts since startup at 2003-05-09T17:16:34)
------
2003-05-09T17:18:58 INFO(0) ZODB conflict error at 
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects 
(3 conflicts since startup at 2003-05-09T17:16:34)
------
2003-05-09T17:19:01 INFO(0) ZODB conflict error at 
/VirtualHostBase/https/twww.brookes.ac.uk:8443/VirtualHostRoot/Brookes/www/manage_delObjects 
(4 conflicts since startup at 2003-05-09T17:16:34)

Without ZEO (2.0.1b1) this is only repeated once, and the operation 
seemingly succeeds (on top of the corrupted FileStorage), but with ZEO 
all transactions involving a write to the FileStorage fail, throwing the 
following error:

ZODB.POSException.ConflictError
Traceback (innermost last):
     * Module ZPublisher.Publish, line 150, in publish_module
     * Module ZPublisher.Publish, line 127, in publish
     * Module ZPublisher.Publish, line 127, in publish
     * Module ZPublisher.Publish, line 127, in publish
     * Module ZPublisher.Publish, line 122, in publish
     * Module Zope.App.startup, line 142, in zpublisher_exception_hook
     * Module ZPublisher.Publish, line 102, in publish
     * Module Zope.App.startup, line 200, in commit
     * Module ZODB.Transaction, line 246, in commit
     * Module ZODB.Connection, line 680, in tpc_vote
     * Module ZEO.ClientStorage, line 722, in tpc_vote
     * Module ZEO.ClientStorage, line 706, in _check_serials
ConflictError: database conflict error (oid 0000000000000af6, serial was 
034cd1e0caa0286e, now 034cd1e11516ba55)

The site is useless until both ZEO and the Client Instance have been 
restarted, at which point it is possible to change the errant object, 
and pack again (assuming no undo), which then fixes the consistency.
However restarting your site whenever someone chooses to undo an 
operation just before your automated pack.

-------------
# Workaround:
-------------

Don't let an undo be the last transaction preceeding a pack (easier said 
than cleanly done with automation).

--------
# Notes:
--------

Machine:
SunOS lr1 5.9 Generic_112233-04 sun4u sparc SUNW,Ultra-Enterprise

Zope:
Primarily we tried the binary Solaris release of 2.6.1 and ZEO 2.0.1b1:

     Zope Version: (Zope 2.6.1 (binary release, python 2.1, 
solaris-2.8-sparc), python 2.1.3, sunos5)
     Python Version: 2.1.3 (#4, Sep 16 2002, 15:52:14) [GCC 2.95.3 
20010315 (release)]
     System Platform: sunos5
     SOFTWARE_HOME: /app/zope/zope_2.6.1/lib/python
     ZOPE_HOME: /app/zope/zope_2.6.1
     INSTANCE_HOME: /data/zope/zope_2.6.1/test/client
     CLIENT_HOME: /data/zope/zope_2.6.1/test/client/var

We have also tried our own homegrown from CVS (Zope-2_6-branch) as of 
2003/05/09 15:20, along with our own Python 2.1.3 and a snapshot of /ZEO 
from CVS (HEAD) (in various combinations):

     Zope Version: (unreleased version, python 2.1.3, sunos5)
     Python Version: 2.1.3 (#1, Mar 26 2003, 16:28:24) [GCC 3.2.2]
     System Platform: sunos5
     SOFTWARE_HOME: /app/zope/zope_2.6.1-cvs/lib/python
     ZOPE_HOME: /app/zope/zope_2.6.1-cvs
     INSTANCE_HOME: /data/zope/zope_2.6.1-cvs/test/client
     CLIENT_HOME: /data/zope/zope_2.6.1-cvs/test/client/var

Further we found that the fsrecover.py script packaged with 2.6.1 
release is useless.  It always removes EVERYTHING from the Data.fs bar 
the magic number.  This was our primary incentive for trying 
Zope-2_6-branch since it includes jeremy's recently improved ZODB 
management scripts.  Note however that jeremy's recent fstest.py does 
not throw an error on inconsistencies of the type mentioned above 
(despite the fact they still seem to cause problems?).

Anything else I can say?  Rather a longwinded message for such an easily 
repeatable bug, but nevermind.
Obviously we'd be happy to help find a solution, be that by providing 
corrupt Data.fs, extra details, testing patches, etc.

Hoping for a fix,

- Robin

-- 
Robin Breathe, Computer Services, Oxford Brookes University, Oxford, UK
rbreathe@brookes.ac.uk       Tel: +44 1865 483323  Fax: +44 1865 483073