[Zodb-checkins] SVN: ZODB/trunk/src/ZODB/DB.py Foward port fix for race in DB.open():

Fri May 21 20:55:01 EDT 2004

Log message for revision 24866:
Foward port fix for race in DB.open():
Under exceedingly rare conditions, a timing hole made it
possible for a second open() call on a database to block for an
arbitrarily long time.  This accounts for the intermittent
failure of a thread to make any progress after 5 minutes in
checkConcurrentUpdates1Storage.

We intend to get rid of most of this delicate lock
business, but before then the test failures are still
hurting me.


-=-
Modified: ZODB/trunk/src/ZODB/DB.py
===================================================================

--- ZODB/trunk/src/ZODB/DB.py	2004-05-21 21:22:20 UTC (rev 24865)
+++ ZODB/trunk/src/ZODB/DB.py	2004-05-22 00:55:00 UTC (rev 24866)
@@ -487,9 +487,9 @@
             # set whenever the pool becomes empty so that threads are
             # forced to wait until the pool gets a connection in it.
             # The lock is acquired when the (empty) pool is
-            # created. The The lock is acquired just prior to removing
-            # the last connection from the pool and just after adding
-            # a connection to an empty pool.
+            # created.  The lock is acquired just prior to removing
+            # the last connection from the pool and released just after
+            # adding a connection to an empty pool.
 
 
             if pools.has_key(version):
@@ -528,22 +528,36 @@
                             pool_lock.release()
                     else: return
 
-            elif len(pool) == 1:
-                # Taking last one, lock the pool
+            elif len(pool)==1:
+                # Taking last one, lock the pool.
                 # Note that another thread might grab the lock
                 # before us, so we might actually block, however,
                 # when we get the lock back, there *will* be a
-                # connection in the pool.
+                # connection in the pool.  OTOH, there's no limit on
+                # how long we may need to wait:  if the other thread
+                # grabbed the lock in this section too, we'll wait
+                # here until another connection is closed.
+                # checkConcurrentUpdates1Storage provoked this frequently
+                # on a hyperthreaded machine, with its second thread
+                # timing out after waiting 5 minutes for DB.open() to
+                # return.  So, if we can't get the pool lock immediately,
+                # now we make a recursive call.  This allows the current
+                # thread to allocate a new connection instead of waiting
+                # arbitrarily long for the single connection in the pool
+                # right now.
                 self._r()
-                pool_lock.acquire()
+                if not pool_lock.acquire(0):
+                    result = DB.open(self, version, transaction, temporary,
+                                     force, waitflag)
+                    self._a()
+                    return result
                 self._a()
                 if len(pool) > 1:
                     # Note that the pool size will normally be 1 here,
                     # but it could be higher due to a race condition.
                     pool_lock.release()
 
-            c = pool[-1]
-            del pool[-1]
+            c = pool.pop()
             c._setDB(self, mvcc=mvcc, txn_mgr=txn_mgr, synch=synch)
             for pool, allocated in pooll:
                 for cc in pool:
@@ -553,7 +567,8 @@
                 transaction[version] = c
             return c
 
-        finally: self._r()
+        finally:
+            self._r()
 
     def removeVersionPool(self, version):
         pools, pooll = self._pools