btw, the code is slightly modified versions of the getting started with Winsock example: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/win... Sune B. Woeller wrote:
I have made two similar testprograms in c++, and the problem also occurs there. Exactly the same pattern as my python client/server scripts in the mail I am replying to.
But then I stumbled upon this flag in the WinSock documentation: SO_EXCLUSIVEADDRUSE See the description here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/win...
It is very interesting reading, especially: "An important caveat to using the SO_EXCLUSIVEADDRUSE option exists: If one or more connections originating from (or accepted on) a port bound with SO_EXCLUSIVEADDRUSE is active, all bind attempts to that port will fail."
This is just what we want (and I think that is standard behaviour on Linux).
I have tested it with my c+ programs, and when i set that option on the server socket before the bind(), it works, bind() in the second server process fails with WSAEADDRINUSE (bind() failed: 10048.)
There is a python bugfix for this, but only for python 2.4: http://sourceforge.net/tracker/index.php?func=detail&aid=982665&group_id=547...
(It is added to version 1.294 of socketmodule.c)
I run the two test programs from two cmd terminals, like I described for the python versions.
// link with ws2_32.lib //sock_server.cpp #include <cstdlib> #include <stdio.h> #include <conio.h> #include "winsock2.h"
void main() {
// Initialize Winsock. WSADATA wsaData; int iResult = WSAStartup( MAKEWORD(2,2), &wsaData ); if ( iResult != NO_ERROR ) printf("Error at WSAStartup()\n");
// Create a socket. SOCKET m_socket; m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );
if ( m_socket == INVALID_SOCKET ) { printf( "Error at socket(): %ld\n", WSAGetLastError() ); WSACleanup(); return; }
// try to use SO_EXCLUSIVEADDRUSE BOOL bOptVal = TRUE; int bOptLen = sizeof(BOOL); if (setsockopt(m_socket, SOL_SOCKET, SO_EXCLUSIVEADDRUSE, (char*)&bOptVal, bOptLen) != SOCKET_ERROR) { printf("Set SO_EXCLUSIVEADDRUSE: ON\n"); }
// Bind the socket. sockaddr_in service;
service.sin_family = AF_INET; service.sin_addr.s_addr = inet_addr( "127.0.0.1" ); service.sin_port = htons( 19990 );
if ( bind( m_socket, (SOCKADDR*) &service, sizeof(service) ) == SOCKET_ERROR ) { printf( "bind() failed: %i.\n", WSAGetLastError() ); closesocket(m_socket); return; }
// Listen on the socket. if ( listen( m_socket, 1 ) == SOCKET_ERROR ) printf( "Error listening on socket.\n");
// Accept connections. SOCKET AcceptSocket;
printf( "Waiting for a client to connect...\n" ); while (1) { AcceptSocket = SOCKET_ERROR; while ( AcceptSocket == SOCKET_ERROR ) { AcceptSocket = accept( m_socket, NULL, NULL ); } printf( "Client Connected.\n"); //m_socket = AcceptSocket; break; } closesocket(m_socket);
// Send and receive data.
int bytesRecv = SOCKET_ERROR;
char recvbuf[32] = ""; bytesRecv = recv( AcceptSocket, recvbuf, 32, 0 ); printf( "Bytes Recv: %ld\n", bytesRecv ); printf("Recieved: %s\n", recvbuf); printf("press a key to terminate\n"); getch();
return; }
//sock_client.cpp #include <stdio.h> #include <conio.h> #include "winsock2.h"
void main() {
// Initialize Winsock. WSADATA wsaData; int iResult = WSAStartup( MAKEWORD(2,2), &wsaData ); if ( iResult != NO_ERROR ) printf("Error at WSAStartup()\n");
// Create a socket. SOCKET m_socket; m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );
if ( m_socket == INVALID_SOCKET ) { printf( "Error at socket(): %ld\n", WSAGetLastError() ); WSACleanup(); return; }
// Connect to a server. sockaddr_in clientService;
clientService.sin_family = AF_INET; clientService.sin_addr.s_addr = inet_addr( "127.0.0.1" ); clientService.sin_port = htons( 19990 );
if ( connect( m_socket, (SOCKADDR*) &clientService, sizeof(clientService) ) == SOCKET_ERROR) { printf( "Failed to connect.\n" ); WSACleanup(); return; }
// Send and receive data. int bytesSent; char sendbuf[32] = ""; printf("Enter string to send (max 30 bytes):\n"); scanf("%s", sendbuf ); printf("Sending: %s\n", sendbuf);
bytesSent = send( m_socket, sendbuf, strlen(sendbuf), 0 ); printf( "Bytes Sent: %ld\n", bytesSent );
printf("press a key to terminate\n"); getch();
return; }
Sune B. Woeller wrote:
Tim Peters wrote:
It's starting to look a lot like the Windows bind() implementation is unreliable, sometimes (but rarely -- hard to provoke) allowing two sockets to bind to the same (address, port) pair simultaneously, instead of raising 'Address already in use' for one of them. Disaster ensues.
WRT the last version of the code I posted, on another XP Pro SP2 machine (again after playing registry games to boost the number of ephemeral ports) I eventually saw all of: hangs during accept(); the assertion errors I mentioned last time; and mystery "Connection refused" errors during connect().
The variant of the code below _only_ tries to use port 19999. If it can't bind to that on the first try, socktest111() raises an exception instead of trying again (or trying a different port number). Ran two processes. After about 15 minutes, both died with assert errors at about the same time (identical, so far as I could tell by eyeball):
Process A:
Traceback (most recent call last): File "socktest.py", line 209, in ? assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname()) AssertionError: ('292739', '821744', ('127.0.0.1', 19999), ('127.0.0.1', 3845))
Process B:
Traceback (most recent call last): File "socktest.py", line 209, in ? assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname()) AssertionError: ('821744', '292739', ('127.0.0.1', 19999), ('127.0.0.1', 3846))
So it's again the business where each process is recv'ing the random string intended to be recv'ed by a socket in the other process. Hypothesized timeline:
process A's `a` binds to 19999 process B's `a` binds to 19999 -- according to me, this should be impossible in the absence of SO_REUSEADDR (which acts very differently on Windows than it does on Linux, BTW -- on Linux this should be impossible even in the presence of SO_REUSEADDR; regardless, we're not using SO_REUSEADDR here, and the braindead hard-coded
w.setsockopt(socket.IPPROTO_TCP, 1, 1)
is actually using the right magic constant for TCP_NODELAY on Windows, as it intends). A and B both listen() A connect()s, and accidentally gets on B.a's accept queue B connect()s, and accidentally gets on A.a's accept queue the rest follows inexorably
This is what I'm experiencing as well. I can narrow it down a bit: I *always* experience one out of two erroneous behaviours, as described below.
I tried to make an even simpler test situation, without binding sockets 'r' and 'w' to each other in the same process. I try to reproduce the problem in a 'standard' socket use case, where a client in one process binds to a server in another process.
The following two scripts acts as a server and a client.
#*********************** # sock_server_reader.py #*********************** import socket
a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
a.bind(("127.0.0.1", 19999)) print a.getsockname() # assigned (host, port) pair
a.listen(1)
print "a accepting:" r, addr = a.accept() # r becomes asyncore's (self.)socket print "a accepted: " print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername())
a.close()
msg = r.recv(100) print 'msg recieved:', msg
#*********************** # sock_client_writer.py #*********************** import socket, random
w = socket.socket (socket.AF_INET, socket.SOCK_STREAM) w.setsockopt(socket.IPPROTO_TCP, 1, 1)
print 'w connecting:' w.connect(('127.0.0.1', 19999)) print 'w connected:' print w.getsockname() print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername()) msg = str(random.randrange(1000000)) print 'sending msg: ', msg w.send(msg)
There are two possible outcomes [a) and b)] of running two instances of this client/server pair (that is, 4 processes in total like the following). (Numbers 1 to 4 are steps executed in chronological order.)
1) python -i sock_server_reader.py The server prints: ('127.0.0.1', 19999) a accepting: and waits for a connection
2) python -i sock_client_writer.py The client prints: w connecting: w connected: ('127.0.0.1', 3774) ('127.0.0.1', 3774), peer=('127.0.0.1', 19999) sending msg: 903848 >>>
and the server now accepts the connection and prints: a accepted: ('127.0.0.1', 19999), peer=('127.0.0.1', 3774) msg recieved: 903848 >>>
This is like it should be. Then lets try to setup a second client/server pair, on the same port (19999). The expected outcome of this is that the bind() call in sock_server_reader.py should fail with socket.error: (10048, 'Address already in use').
3) python -i sock_server_reader.py The server prints: ('127.0.0.1', 19999) a accepting:
Already here the problem occurs, bind() is allowed to bind to a port that is in use, in this case by the client socket 'r'. [also on other windows ? Mikkel: yes. Diku:???]
4) python -i sock_client_writer.py Now one out of two things happen:
a) The client prints: w connecting: Traceback (most recent call last): File "c:\pyscripts\sock_client_writer.py", line 7, in ? w.connect(('127.0.0.1', 19999)) File "<string>", line 1, in connect socket.error: (10061, 'Connection refused') >>> The server waits on the call to accept(), still waiting for a connection. (This is the blocking behaviour I reported in my first mail, experienced when running two zope instances. The socket error was swallowed by the unconditional except clause).
b) The client connects to the server: w connecting: w connected: ('127.0.0.1', 3865) ('127.0.0.1', 3865), peer=('127.0.0.1', 19999) sending msg: 119105 >>>
and the server now accepts the connection and prints: a accepted: ('127.0.0.1', 19999), peer=('127.0.0.1', 3865) msg recieved: 119105 >>>
The second set of client/server processes are now connected on the same port as the first set of client/server processes. In a port scanner the port now belongs two the second server process [3)].
I always get one out of these two possibilities (a and b), I never see bind() raising socket.error: (10048, 'Address already in use').
It is important to realize that both these outcomes are an error.
I tried the same process as above on a linux system, and 3) always raises (10048, 'Address already in use').
If case a) occured, where w.connect raises socket.error: (10061, 'Connection refused'), trying to run a third client/server pair, the bind() call raises (10048, 'Address already in use'). The 'a'-socket from the second pair of processes is not closed in this case, but still trying to accept().
In my case bind() always raises (10048, 'Address already in use') when there is an open server socket like 'a' bound to the same port.
To summarize: Closing a server socket bound to a given port, alows another server socket to bind to the same port, even when there are open client sockets bound to the port.
Note that because this never tries a port number other than 19999, it can't be a bulletproof workaround simply to hold on to the `a` socket. If the hypothesized timeline above is right, bind() can't be trusted on Windows in any situation where two processes may try to bind to the same hostname:port pair at the same time. Holding on to `a`, and cycling through port numbers when bind() failed, would still potentially leave two processes trying to bind to the same port number simultaneously (just a port other than 19999).
It would not be enough to keep a reference to 'a'. It would have to be kept open as well. And maybe that is not a problem, since we only accept() once - only one 'w' client socket would be able to be accepted. Normally the use case for closing the server socket is to disallow more connections than those already acceptet. (But I'm not so experienced with sockets, I might be wrong.)
Ick: this happens under Pythons 2.3.5 (MSVC 6) and 2.4.1 (MSVC 7.1), so if it is -- as is looking more and more likely --an error in MS's socket implementation, it isn't avoided by switching to a newer MS C library.
Frankly, I don't see a sane way to worm around this -- it's difficult for application code to worm around what smells like a missing critical section in system code.
Using the simpler socket dance from the ZODB 3.4 code, I haven't yet seen an instance of the assert failure, or a hang. However, let two processes run that long enough simultaneously, and it always (so far) eventually fails with
socket.error: (10048, 'Address already in use')
in the w.connect() call, and despite that Windows picks the port numbers here!
That is exactly what I feared could happen. As shown in my example above, the other that might happen is that the port is 'taken over' by the other process.
While that also smells to heaven of a missing critical section in the Windows socket implementation, an exception is much easier to live with / worm around. Alas, we don't have the MS source code, and I don't have time to try disassembling / reverse-engineering the opcodes (what EULA <wink>?), so best I can do is run this for many more hours to try to increase confidence that an exception is the worst that can occur under the ZODB 3.4 spelling.
Here's full code for the "only try port 19999" version:
import socket, errno import time, random def socktest111(): """Raise an exception if we can't get 19999. """
a = socket.socket (socket.AF_INET, socket.SOCK_STREAM) w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
# set TCP_NODELAY to true to avoid buffering w.setsockopt(socket.IPPROTO_TCP, 1, 1)
# tricky: get a pair of connected sockets host = '127.0.0.1' port = 19999
try: a.bind((host, port)) except: raise RuntimeError else: print 'b',
a.listen (1) w.setblocking (0) try: w.connect ((host, port)) except: pass print 'c', r, addr = a.accept() print 'a', a.close() print 'c', w.setblocking (1)
return (r, w)
sofar = [] try: while 1: try: stuff = socktest111() except RuntimeError: print 'x', time.sleep(random.random()/10) continue sofar.append(stuff) time.sleep(random.random()/10) if len(sofar) == 50: tup = sofar.pop(0) r, w = tup msg = str(random.randrange(1000000)) w.send(msg) msg2 = r.recv(100) assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname()) for s in tup: s.close() except KeyboardInterrupt: for tup in sofar: for s in tup: s.close() _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )