I'll be at LinuxWorld tomorrow and mostly aware from my email, but here's a brief response that might offer some clues. On Thu, 2004-01-22 at 21:45, Dennis Allison wrote:
I have been having problems with Zope locking up (deadlock?) when ZODB conflicts occur. I am running Zope 2.6.2b3 with the 3.1.2b2 version of the ZODB (that is, a vanilla install). Python is 2.1.3.
It's possible that upgrading to a non-beta release will fix, although most of the issues resolved in the 3.1.x maintenance releases were ZEO issues.
The diagnostic is always:
2004-01-22T12:25:15 ERROR(200) ZODB A storage error occured during transaction abort. This shouldn't happen. Traceback (innermost last): File /home/zopesys/lib/python/ZODB/Transaction.py, line 262, in commit File /home/zopesys/lib/python/ZODB/Transaction.py, line 393, in _commit_error AttributeError: Surrogate instance has no attribute '_p_oid'
There is more than one error here, which makes the log message very confusing. In particular, there is an exception that occurs during the second phase of the two-phase commit, a second error that occurs trying to recover from the first error, and a third error trying to log a message about the second error. It's a real mess. It might help to change the way logging is done so that you can see the first two error messages. In _commit_error(), change the self.log() call in these lines except: # nothing to do but log the error self.log("Failed to abort object %s" % repr(o._p_oid), error=sys.exc_info()) to say this: self.log("Failed to abort object %s" % repr(o), error=sys.exc_info()) And change one line in _commit # Ugh, we got an got an error during commit, so we # have to clean up. First save the original exception # in case the cleanup process causes another # exception. error = sys.exc_info() try: self._commit_error(objects, ncommitted, jars, subjars) except: LOG('ZODB', ERROR, "A storage error occured during transaction " "abort. This shouldn't happen.", error=sys.exc_info()) The LOG() call should be passing error=error, not error=sys.exc_info(). I don't think those changes will introduce any new problems and will tell us a lot about what is really going on.
As I understand what's happening (after reading the logs carefully), I have one or more folks trying to use the same object. Zope's ZODB serializes the access and aborts all but the oldest of the conflicting transactions. The diagnostic occurs because a registered object is missing the _p_oid attribute that's used to rollback the transaction. The big question is which of the several sub-transactions is the culprit--or is it a bug as Dieter has suggested, soluable by providing a default value for _p_oid --
That right in a rough sense. ZODB uses optimistic concurrency control, so later transactions are aborted if they conflict with already-committed ones. But the transactions actually run concurrently. The original error in the second-phase of two-phase commit is part of the problem. Since it's in the second phase, it's not caused by a ZODB conflict but by some more serious failure of one of the transactional resource managers. Unfortunately, we can't tell which one because the buggy logging code has swallowed the error. Jeremy