Success!!! Re: [ZODB-Dev] ZODB Deadlock issue python frame trace
John D. Heintz
jheintz@isogen.com
Mon, 21 May 2001 11:21:50 -0500
Jeremy,
That looks to have done the trick! I re-ran my tests and didn't get a si=
ngle=20
lock up. Thanks for all the hard work!
John
On Friday 18 May 2001 11:23, Jeremy Hylton wrote:
> John,
>
> I've got a(nother) potential fix for the problem you discovered. I'm
> much happier with this fix, because I think it solves the problem
> pretty much the right way.
>
> The cause of the bug (can't remember if I explained it on the list or
> in private email) is that tpc_finish() acquires the storage and db
> locks in a different order than DB.open(). As a result, if each
> starts at the same time and gets one of the two locks it needs, the
> system will be deadlocked.
>
> For a deadlock problem like this, there seem to be two a couple of
> simple strategies we can use to fix it. One is to see if the locks
> cover more data than they need to; if they do, we can use two
> different locks that each cover less data. Unfortunately, that
> approach doesn't work here. The DB lock that is held during open()
> and during the invalidation phase of tpc_finish() are protecting
> exactly the same data structures.
>
> The other simple strategy is to enforce a consistent locking order.
> If a thread is going to hold the DB lock and the storage lock, it MUST
> acquire the DB lock first. That's what my proposed solution does.
>
> The DB object gets methods called begin_invalidation() and
> finish_invalidation() that acquire and release the DB lock
> respectively. Before the Connection calls tpc_finish() on the
> storage, it calls begin_invalidation(). This guarantees that the DB
> acquired before the storage lock.
>
> When the invalidation phase is over, the Connection calls
> end_invalidation() to release the DB lock. This is an optimization.
> It could wait until tpc_finish() returns, but we know that the DB will
> not be used again for the rest of the tpc_finish() and tpc_finish()
> could take a long time.
>
> There is a patch below. It passes that FileStorage test suite, but
> that doesn't really test the locking behavior. Does it solve your
> problem?
>
> Jeremy
>
> Index: Connection.py
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvs-repository/Zope2/lib/python/ZODB/Connection.py,v
> retrieving revision 1.53
> diff -c -c -r1.53 Connection.py
> *** Connection.py=092001/05/17 18:35:10=091.53
> --- Connection.py=092001/05/18 16:06:16
> ***************
> *** 683,700 ****
> def tpc_finish(self, transaction):
>
> # It's important that the storage call the function we pass
> ! # (self.tpc_finish_) while it still has it's lock. We don't
> ! # want another thread to be able to read any updated data
> ! # until we've had a chance to send an invalidation message to
> ! # all of the other connections!
>
> if self._tmp is not None:
> # Commiting a subtransaction!
> # There is no need to invalidate anything.
> ! self._storage.tpc_finish(transaction, self._invalidate_su=
b)
> self._storage._creating[:0]=3Dself._creating
> del self._creating[:]
> else:
> self._storage.tpc_finish(transaction,
> self._invalidate_invalidating)
>
> --- 656,674 ----
> def tpc_finish(self, transaction):
>
> # It's important that the storage call the function we pass
> ! # (self._invalidate_invalidating) while it still has it's
> ! # lock. We don't want another thread to be able to read any
> ! # updated data until we've had a chance to send an
> ! # invalidation message to all of the other connections!
>
> if self._tmp is not None:
> # Commiting a subtransaction!
> # There is no need to invalidate anything.
> ! self._storage.tpc_finish(transaction)
> self._storage._creating[:0]=3Dself._creating
> del self._creating[:]
> else:
> + self._db.begin_invalidation()
> self._storage.tpc_finish(transaction,
> self._invalidate_invalidating)
>
> ***************
> *** 703,715 ****
>
> def _invalidate_invalidating(self):
> invalidate=3Dself._db.invalidate
> ! for oid in self._invalidating: invalidate(oid, self)
> !
> ! def _invalidate_sub(self):
> ! # There's no point in invalidating any objects in a
> subtransaction ! #
> ! # Because we may ultimately abort the containing transaction.
> ! pass
>
> def sync(self):
> get_transaction().abort()
> --- 677,685 ----
>
> def _invalidate_invalidating(self):
> invalidate=3Dself._db.invalidate
> ! for oid in self._invalidating:
> ! invalidate(oid, self)
> ! self._db.finish_invalidation()
>
> def sync(self):
> get_transaction().abort()
>
> Index: DB.py
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvs-repository/Zope2/lib/python/ZODB/DB.py,v
> retrieving revision 1.29
> diff -c -c -r1.29 DB.py
> *** DB.py=092001/05/17 18:35:10=091.29
> --- DB.py=092001/05/18 16:07:08
> ***************
> *** 311,316 ****
> --- 311,326 ----
> def importFile(self, file):
> raise 'Not yet implemented'
>
> + def begin_invalidation(self):
> + # Must be called before first call to invalidate and before
> + # the storage lock is held.
> + self._a()
> +
> + def finish_invalidation(self):
> + # Must be called after begin_invalidation() and after final
> + # invalidate() call.
> + self._r()
> +
> def invalidate(self, oid, connection=3DNone, version=3D'',
> rc=3Dsys.getrefcount):
> """Invalidate references to a given oid.
> ***************
> *** 320,357 ****
> passed in to prevent useless (but harmless) messages to the
> connection.
> """
> ! if connection is not None: version=3Dconnection._version
> ! self._a()
> ! try:
>
> ! # Update modified in version cache
> ! h=3Dhash(oid)%131
> ! cache=3Dself._miv_cache
> ! o=3Dcache.get(h, None)
> ! if o and o[0]=3D=3Doid: del cache[h]
> !
> ! # Notify connections
> ! pools,pooll=3Dself._pools
> ! for pool, allocated in pooll:
> ! for cc in allocated:
> if (cc is not connection and
> (not version or cc._version=3D=3Dversion)):
> - if rc(cc) <=3D 3:
> - cc.close()
> cc.invalidate(oid)
> !
> ! temps=3Dself._temps
> ! if temps:
> ! t=3D[]
> ! for cc in temps:
> ! if rc(cc) > 3:
> ! if (cc is not connection and
> ! (not version or cc._version=3D=3Dversion)=
):
> ! cc.invalidate(oid)
> ! t.append(cc)
> ! else: cc.close()
> ! self._temps=3Dt
> ! finally: self._r()
>
> def invalidateMany(self, oids=3DNone, version=3D''):
> if oids is None: self.invalidate(None, version=3Dversion)
> --- 330,362 ----
> passed in to prevent useless (but harmless) messages to the
> connection.
> """
> ! if connection is not None:
> ! version=3Dconnection._version
> ! # Update modified in version cache
> ! h=3Dhash(oid)%131
> ! o=3Dself._miv_cache.get(h, None)
> ! if o is not None and o[0]=3D=3Doid: del self._miv_cache[h]
>
> ! # Notify connections
> ! for pool, allocated in self._pools[1]:
> ! for cc in allocated:
> ! if (cc is not connection and
> ! (not version or cc._version=3D=3Dversion)):
> ! if rc(cc) <=3D 3:
> ! cc.close()
> ! cc.invalidate(oid)
> !
> ! temps=3Dself._temps
> ! if temps:
> ! t=3D[]
> ! for cc in temps:
> ! if rc(cc) > 3:
> if (cc is not connection and
> (not version or cc._version=3D=3Dversion)):
> cc.invalidate(oid)
> ! t.append(cc)
> ! else: cc.close()
> ! self._temps=3Dt
>
> def invalidateMany(self, oids=3DNone, version=3D''):
> if oids is None: self.invalidate(None, version=3Dversion)
>
>
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list - ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev
--=20
=2E . . . . . . . . . . . . . . . . . . . . . . .
John D. Heintz | Senior Engineer
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.633.1198 | jheintz@isogen.com
w w w . d a t a c h a n n e l . c o m