[ZODB-Dev] ZODB 3.1: Race condition in ZEO reconnect protocol

Dieter Maurer dieter at handshake.de
Thu Oct 2 02:15:47 EDT 2003


On our fastest servers (and only there)
(Dual 2 GHz AMD, 1 GB RAM, Linux, Python 2.1.3, ZODB 3.1 CVS from "Zope_2.6branch")
the reconnect tests in "ZEO/tests/testConnection.py" fail non-deterministically
but with high probability (in more than 95 % one of the 6 reconnect
tests fails; which one is chosen randomly).

Apparently, there is a race condition.

Most noticable observation: sometimes "asyncore.Dispatcher.handle_connect"
reports an "Unhandled connect event", often (but not always)
followed by "Exception in task 'Collect([(2,...)]) -- Disconnected".

I attach output from different failing tests (instrumented by myself
to get more information of what happens).

The test passes as soon as logging is enabled -- a clear indication
for a sensible race condition.

Dieter
-------------- next part --------------
ERROR: ncheckReconnectFallback (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
ncheckReconnectWritable (testConnection.FileStorageConnectionTests) ... start_zeo_server: created 25148 4
ZEOClientExit: sending close request 4
start_zeo_server: created 25151 9
warning: unhandled connect event: ZEO.zrpc.connection.ManagedConnection
Exception in thread Connect([(2, ('localhost', 28494))]):
Traceback (most recent call last):
  File "/usr/local/lib/python2.1dbg/threading.py", line 378, in __bootstrap
    self.run()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 290, in run
    success = self.try_connecting(attempt_timeout)
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 321, in try_connecting
    r = self._connect_wrappers(wrappers, deadline)
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 382, in _connect_wrappers
    wrap.connect_procedure()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 468, in connect_procedure
    self.test_connection()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 477, in test_connection
    self.client, self.mgr)
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 528, in __init__
    self.__super_init(sock, addr, obj)
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 147, in __init__
    self.handshake()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 186, in handshake
    self.message_output(self.protocol_version)
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/smac.py", line 220, in message_output
    raise Disconnected, (
Disconnected: This action is temporarily unavailable.<p>

FAIL
ZEOClientExit: sending close request 9
....
======================================================================
FAIL: ncheckReconnectReadOnly (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 297, in checkReconnectReadOnly
    self.pollUp()
  File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 129, in pollUp
    self.fail("timed out waiting for storage to connect")
  File "/usr/local/lib/python2.1dbg/unittest.py", line 241, in fail
    raise self.failureException, msg
AssertionError: timed out waiting for storage to connect
-------------- next part --------------
checkReconnectReadOnly (testConnection.FileStorageConnectionTests) ... start_zeo_server: created 20998 4
ZEOClientExit: sending close request 4
start_zeo_server: created 21001 10
FAIL
ZEOClientExit: sending close request 10
....
======================================================================
FAIL: checkReconnectReadOnly (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 297, in checkReconnectReadOnly
    self.pollUp()
  File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 129, in pollUp
    self.fail("timed out waiting for storage to connect")
  File "/usr/local/lib/python2.1dbg/unittest.py", line 241, in fail
    raise self.failureException, msg
AssertionError: timed out waiting for storage to connect
-------------- next part --------------
checkReconnectFallback (testConnection.FileStorageConnectionTests) ... start_zeo_server: created 22707 4
ZEOClientExit: sending close request 4
start_zeo_server: created 22710 8
warning: unhandled connect event: ZEO.zrpc.connection.ManagedConnection
ZEOClientExit: sending close request 8
ok
-------------- next part --------------
ERROR: ncheckReconnectFallback (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 322, in ncheckReconnectFallback
    self.pollUp()
  File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 126, in pollUp
    asyncore.poll(0.1)
  File "/usr/local/lib/python2.1dbg/asyncore.py", line 96, in poll
    obj.handle_error()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 260, in handle_error
    self.close()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 561, in close
    self.__super_close()
  File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 160, in close
    self._map.clear()
  File "/usr/local/lib/python2.1dbg/asyncore.py", line 361, in __getattr__
    return getattr (self.socket, attr)
AttributeError: _map


More information about the ZODB-Dev mailing list