[Zodb-checkins] SVN: ZODB/branches/3.4/ Worm around suspected
Windows socket bug in Windows trigger code.
Tim Peters
tim.one at comcast.net
Mon Aug 1 16:02:24 EDT 2005
Log message for revision 37631:
Worm around suspected Windows socket bug in Windows trigger code.
See the thread starting at
http://mail.zope.org/pipermail/zope/2005-July/160433.html
for gory details.
Changed:
U ZODB/branches/3.4/NEWS.txt
U ZODB/branches/3.4/src/ZEO/zrpc/trigger.py
-=-
Modified: ZODB/branches/3.4/NEWS.txt
===================================================================
--- ZODB/branches/3.4/NEWS.txt 2005-08-01 18:44:41 UTC (rev 37630)
+++ ZODB/branches/3.4/NEWS.txt 2005-08-01 20:02:23 UTC (rev 37631)
@@ -5,6 +5,7 @@
Following are dates of internal releases (to support ongoing Zope 2
development) since ZODB 3.4's last public release:
+- 3.4.1b2 DD-MMM-2005
- 3.4.1b1 26-Jul-2005
- 3.4.1a6 19-Jul-2005
- 3.4.1a5 12-Jul-2005
@@ -106,6 +107,17 @@
example, debugging prints added to Python's ``asyncore.loop`` won't be lost
anymore).
+Windows
+-------
+
+- (3.4.1b2) As developed in a long thread starting at
+ http://mail.zope.org/pipermail/zope/2005-July/160433.html
+ there appears to be a race bug in the Microsoft Windows socket
+ implementation, rarely visible in ZEO when multiple processes try to
+ create an "asyncore trigger" simultaneously. Windows-specific code in
+ ``ZEO/zrpc/trigger.py`` changed to work around this bug when it occurs.
+
+
Tools
-----
Modified: ZODB/branches/3.4/src/ZEO/zrpc/trigger.py
===================================================================
--- ZODB/branches/3.4/src/ZEO/zrpc/trigger.py 2005-08-01 18:44:41 UTC (rev 37630)
+++ ZODB/branches/3.4/src/ZEO/zrpc/trigger.py 2005-08-01 20:02:23 UTC (rev 37631)
@@ -1,6 +1,6 @@
##############################################################################
#
-# Copyright (c) 2001, 2002 Zope Corporation and Contributors.
+# Copyright (c) 2001-2005 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
@@ -156,27 +156,61 @@
def __init__(self):
_triggerbase.__init__(self)
+
# Get a pair of connected sockets. The trigger is the 'w'
# end of the pair, which is connected to 'r'. 'r' is put
# in the asyncore socket map. "pulling the trigger" then
# means writing something on w, which will wake up r.
- a = socket.socket() # temporary, to set up the connection
+
w = socket.socket()
- self.trigger = w
- # set TCP_NODELAY to true to avoid buffering
- w.setsockopt(socket.IPPROTO_TCP, 1, 1)
+ # Disable buffering -- pulling the trigger sends 1 byte,
+ # and we want that sent immediately, to wake up asyncore's
+ # select() ASAP.
+ w.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
- # Specifying port 0 tells Windows to pick a port for us.
- a.bind(("127.0.0.1", 0))
- connect_address = a.getsockname() # assigned (host, port) pair
- a.listen(1)
- w.connect(connect_address)
+ count = 0
+ while 1:
+ count += 1
+ # Bind to a local port; for efficiency, let the OS pick
+ # a free port for us.
+ # Unfortunately, stress tests showed that we may not
+ # be able to connect to that port ("Address already in
+ # use") despite that the OS picked it. This appears
+ # to be a race bug in the Windows socket implementation.
+ # So we loop until a connect() succeeds (almost always
+ # on the first try). See the long thread at
+ # http://mail.zope.org/pipermail/zope/2005-July/160433.html
+ # for hideous details.
+ a = socket.socket()
+ a.bind(("127.0.0.1", 0))
+ connect_address = a.getsockname() # assigned (host, port) pair
+ a.listen(1)
+ try:
+ w.connect(connect_address)
+ break # success
+ except socket.error, detail:
+ if detail[0] != errno.WSAEADDRINUSE:
+ # "Address already in use" is the only error
+ # I've seen on two WinXP Pro SP2 boxes, under
+ # Pythons 2.3.5 and 2.4.1.
+ raise
+ # (10048, 'Address already in use')
+ # assert count <= 2 # never triggered in Tim's tests
+ if count >= 10: # I've never seen it go above 2
+ a.close()
+ w.close()
+ raise BindError("Cannot bind trigger!")
+ # Close `a` and try again. Note: I originally put a short
+ # sleep() here, but it didn't appear to help or hurt.
+ a.close()
+
r, addr = a.accept() # r becomes asyncore's (self.)socket
a.close()
+ self.trigger = w
asyncore.dispatcher.__init__(self, r)
def _close(self):
- # self.socket is r, self.trigger is w from __init__
+ # self.socket is r, and self.trigger is w, from __init__
self.socket.close()
self.trigger.close()
More information about the Zodb-checkins
mailing list