[Zodb-checkins] CVS: ZODB3/ZEO/zrpc - connection.py:1.37

Guido van Rossum guido@python.org
Fri, 27 Sep 2002 17:57:46 -0400


Update of /cvs-repository/ZODB3/ZEO/zrpc
In directory cvs.zope.org:/tmp/cvs-serv16596

Modified Files:
	connection.py 
Log Message:
In wait(), when there's no asyncore main loop, we called
asyncore.poll() with a timeout of 10 seconds.  Change this to a
variable timeout starting at 1 msec and doubling until 1 second.

While debugging Win2k crashes in the check4ExtStorageThread test from
ZODB/tests/MTStorage.py, Tim noticed that there were frequent 10
second gaps in the log file where *nothing* happens.  These were caused
by the following scenario.

Suppose a ZEO client process has two threads using the same connection
to the ZEO server, and there's no asyncore loop active.  T1 makes a
synchronous call, and enters the wait() function.  Then T2 makes
another synchronous call, and enters the wait() function.  At this
point, both are blocked in the select() call in asyncore.poll(), with
a timeout of 10 seconds (in the old version).  Now the replies for
both calls arrive.  Say T1 wakes up.  The handle_read() method in
smac.py calls self.recv(8096), so it gets both replies in its buffer,
decodes both, and calls self.message_input() for both, which sticks
both replies in the self.replies dict.  Now T1 finds its response, its
wait() call returns with it.  But T2 is still stuck in
asyncore.poll(): its select() call never woke up, and has to "sit out"
the whole timeout of 10 seconds.  (Good thing I added timeouts to
everything!  Or perhaps not, since it masked the problem.)

One other condition must be satisfied before this becomes a disaster:
T2 must have started a transaction, and all other threads must be
waiting to start another transaction.  This is what I saw in the log.
(Hmm, maybe a message should be logged when a thread is waiting to
start a transaction this way.)

In a real Zope application, this won't happen, because there's a
centralized asyncore loop in a separate thread (probably the client's
main thread) and the various threads would be waiting on the condition
variable; whenever a reply is inserted in the replies dict, all
threads are notified.  But in the test suite there's no asyncore loop,
and I don't feel like adding one.  So the exponential backoff seems
the easiest "solution".


=== ZODB3/ZEO/zrpc/connection.py 1.36 => 1.37 ===
--- ZODB3/ZEO/zrpc/connection.py:1.36	Wed Sep 25 18:28:10 2002
+++ ZODB3/ZEO/zrpc/connection.py	Fri Sep 27 17:57:46 2002
@@ -351,10 +351,15 @@
     def wait(self, msgid):
         """Invoke asyncore mainloop and wait for reply."""
         if __debug__:
-            log("wait() async=%d" % self.is_async(), level=zLOG.TRACE)
+            log("wait(%d), async=%d" % (msgid, self.is_async()),
+                level=zLOG.TRACE)
         if self.is_async():
             self.trigger.pull_trigger()
 
+        # Delay used when we call asyncore.poll() directly.
+        # Start with a 1 msec delay, double until 1 sec.
+        delay = 0.001
+
         self.replies_cond.acquire()
         try:
             while 1:
@@ -363,6 +368,9 @@
                 reply = self.replies.get(msgid)
                 if reply is not None:
                     del self.replies[msgid]
+                    if __debug__:
+                        log("wait(%d): reply=%s" % (msgid, short_repr(reply)),
+                            level=zLOG.DEBUG)
                     return reply
                 if self.is_async():
                     self.replies_cond.wait(10.0)
@@ -370,7 +378,12 @@
                     self.replies_cond.release()
                     try:
                         try:
-                            asyncore.poll(10.0, self._map)
+                            if __debug__:
+                                log("wait(%d): asyncore.poll(%s)" %
+                                    (msgid, delay), level=zLOG.TRACE)
+                            asyncore.poll(delay, self._map)
+                            if delay < 1.0:
+                                delay += delay
                         except select.error, err:
                             log("Closing.  asyncore.poll() raised %s." % err,
                                 level=zLOG.BLATHER)