[ZODB-Dev] ZODB 3.1: Race condition in ZEO reconnect protocol
Dieter Maurer
dieter at handshake.de
Thu Oct 2 02:15:47 EDT 2003
On our fastest servers (and only there)
(Dual 2 GHz AMD, 1 GB RAM, Linux, Python 2.1.3, ZODB 3.1 CVS from "Zope_2.6branch")
the reconnect tests in "ZEO/tests/testConnection.py" fail non-deterministically
but with high probability (in more than 95 % one of the 6 reconnect
tests fails; which one is chosen randomly).
Apparently, there is a race condition.
Most noticable observation: sometimes "asyncore.Dispatcher.handle_connect"
reports an "Unhandled connect event", often (but not always)
followed by "Exception in task 'Collect([(2,...)]) -- Disconnected".
I attach output from different failing tests (instrumented by myself
to get more information of what happens).
The test passes as soon as logging is enabled -- a clear indication
for a sensible race condition.
Dieter
-------------- next part --------------
ERROR: ncheckReconnectFallback (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
ncheckReconnectWritable (testConnection.FileStorageConnectionTests) ... start_zeo_server: created 25148 4
ZEOClientExit: sending close request 4
start_zeo_server: created 25151 9
warning: unhandled connect event: ZEO.zrpc.connection.ManagedConnection
Exception in thread Connect([(2, ('localhost', 28494))]):
Traceback (most recent call last):
File "/usr/local/lib/python2.1dbg/threading.py", line 378, in __bootstrap
self.run()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 290, in run
success = self.try_connecting(attempt_timeout)
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 321, in try_connecting
r = self._connect_wrappers(wrappers, deadline)
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 382, in _connect_wrappers
wrap.connect_procedure()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 468, in connect_procedure
self.test_connection()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/client.py", line 477, in test_connection
self.client, self.mgr)
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 528, in __init__
self.__super_init(sock, addr, obj)
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 147, in __init__
self.handshake()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 186, in handshake
self.message_output(self.protocol_version)
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/smac.py", line 220, in message_output
raise Disconnected, (
Disconnected: This action is temporarily unavailable.<p>
FAIL
ZEOClientExit: sending close request 9
....
======================================================================
FAIL: ncheckReconnectReadOnly (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 297, in checkReconnectReadOnly
self.pollUp()
File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 129, in pollUp
self.fail("timed out waiting for storage to connect")
File "/usr/local/lib/python2.1dbg/unittest.py", line 241, in fail
raise self.failureException, msg
AssertionError: timed out waiting for storage to connect
-------------- next part --------------
checkReconnectReadOnly (testConnection.FileStorageConnectionTests) ... start_zeo_server: created 20998 4
ZEOClientExit: sending close request 4
start_zeo_server: created 21001 10
FAIL
ZEOClientExit: sending close request 10
....
======================================================================
FAIL: checkReconnectReadOnly (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 297, in checkReconnectReadOnly
self.pollUp()
File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 129, in pollUp
self.fail("timed out waiting for storage to connect")
File "/usr/local/lib/python2.1dbg/unittest.py", line 241, in fail
raise self.failureException, msg
AssertionError: timed out waiting for storage to connect
-------------- next part --------------
checkReconnectFallback (testConnection.FileStorageConnectionTests) ... start_zeo_server: created 22707 4
ZEOClientExit: sending close request 4
start_zeo_server: created 22710 8
warning: unhandled connect event: ZEO.zrpc.connection.ManagedConnection
ZEOClientExit: sending close request 8
ok
-------------- next part --------------
ERROR: ncheckReconnectFallback (testConnection.FileStorageConnectionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 322, in ncheckReconnectFallback
self.pollUp()
File "/local/zope/Zope/Base/lib/python/ZEO/tests/ConnectionTests.py", line 126, in pollUp
asyncore.poll(0.1)
File "/usr/local/lib/python2.1dbg/asyncore.py", line 96, in poll
obj.handle_error()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 260, in handle_error
self.close()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 561, in close
self.__super_close()
File "/local/zope/Zope/Base/lib/python/ZEO/zrpc/connection.py", line 160, in close
self._map.clear()
File "/usr/local/lib/python2.1dbg/asyncore.py", line 361, in __getattr__
return getattr (self.socket, attr)
AttributeError: _map
More information about the ZODB-Dev
mailing list