[Zope] Re: Running more than one instance on windows often block each other

Sune B. Woeller sune at syntetisk.dk
Thu Jul 28 10:05:23 EDT 2005


I have made two similar testprograms in c++, and the problem also occurs there. 
Exactly the same pattern as my python client/server scripts in the mail I am 
replying to.

But then I stumbled upon this flag in the WinSock documentation: SO_EXCLUSIVEADDRUSE
See the description here:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/using_so_exclusiveaddruse.asp

It is very interesting reading, especially:
"An important caveat to using the SO_EXCLUSIVEADDRUSE option exists: If one or 
more connections originating from (or accepted on) a port bound with 
SO_EXCLUSIVEADDRUSE is active, all bind attempts to that port will fail."

This is just what we want (and I think that is standard behaviour on Linux).

I have tested it with my c+ programs, and when i set that option on the server 
socket before the bind(), it works, bind() in the second server process fails 
with WSAEADDRINUSE
(bind() failed: 10048.)

There is a python bugfix for this, but only for python 2.4:
http://sourceforge.net/tracker/index.php?func=detail&aid=982665&group_id=5470&atid=305470
(It is added to version 1.294 of socketmodule.c)


I run the two test programs from two cmd terminals, like I described for the 
python versions.

// link with ws2_32.lib
//sock_server.cpp
#include <cstdlib>
#include <stdio.h>
#include <conio.h>
#include "winsock2.h"

void main() {

     // Initialize Winsock.
     WSADATA wsaData;
     int iResult = WSAStartup( MAKEWORD(2,2), &wsaData );
     if ( iResult != NO_ERROR )
         printf("Error at WSAStartup()\n");

     // Create a socket.
     SOCKET m_socket;
     m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );

     if ( m_socket == INVALID_SOCKET ) {
         printf( "Error at socket(): %ld\n", WSAGetLastError() );
         WSACleanup();
         return;
     }

     // try to use SO_EXCLUSIVEADDRUSE
     BOOL bOptVal = TRUE;
     int bOptLen = sizeof(BOOL);
     if (setsockopt(m_socket, SOL_SOCKET, SO_EXCLUSIVEADDRUSE, (char*)&bOptVal, 
bOptLen) != SOCKET_ERROR) {
         printf("Set SO_EXCLUSIVEADDRUSE: ON\n");
       }

     // Bind the socket.
     sockaddr_in service;

     service.sin_family = AF_INET;
     service.sin_addr.s_addr = inet_addr( "127.0.0.1" );
     service.sin_port = htons( 19990 );

     if ( bind( m_socket, (SOCKADDR*) &service, sizeof(service) ) == 
SOCKET_ERROR ) {
         printf( "bind() failed: %i.\n", WSAGetLastError() );
         closesocket(m_socket);
         return;
     }

     // Listen on the socket.
     if ( listen( m_socket, 1 ) == SOCKET_ERROR )
         printf( "Error listening on socket.\n");

     // Accept connections.
     SOCKET AcceptSocket;

     printf( "Waiting for a client to connect...\n" );
     while (1) {
         AcceptSocket = SOCKET_ERROR;
         while ( AcceptSocket == SOCKET_ERROR ) {
             AcceptSocket = accept( m_socket, NULL, NULL );
         }
         printf( "Client Connected.\n");
         //m_socket = AcceptSocket;
         break;
     }
     closesocket(m_socket);

     // Send and receive data.

     int bytesRecv = SOCKET_ERROR;

     char recvbuf[32] = "";
     bytesRecv = recv( AcceptSocket, recvbuf, 32, 0 );
     printf( "Bytes Recv: %ld\n", bytesRecv );
     printf("Recieved: %s\n", recvbuf);
     printf("press a key to terminate\n");
     getch();

     return;
}

//sock_client.cpp
#include <stdio.h>
#include <conio.h>
#include "winsock2.h"

void main() {

     // Initialize Winsock.
     WSADATA wsaData;
     int iResult = WSAStartup( MAKEWORD(2,2), &wsaData );
     if ( iResult != NO_ERROR )
         printf("Error at WSAStartup()\n");

     // Create a socket.
     SOCKET m_socket;
     m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );

     if ( m_socket == INVALID_SOCKET ) {
         printf( "Error at socket(): %ld\n", WSAGetLastError() );
         WSACleanup();
         return;
     }

     // Connect to a server.
     sockaddr_in clientService;

     clientService.sin_family = AF_INET;
     clientService.sin_addr.s_addr = inet_addr( "127.0.0.1" );
     clientService.sin_port = htons( 19990 );

     if ( connect( m_socket, (SOCKADDR*) &clientService, sizeof(clientService) ) 
== SOCKET_ERROR) {
         printf( "Failed to connect.\n" );
         WSACleanup();
         return;
     }

     // Send and receive data.
     int bytesSent;
     char sendbuf[32] = "";
     printf("Enter string to send (max 30 bytes):\n");
     scanf("%s", sendbuf );
     printf("Sending: %s\n", sendbuf);

     bytesSent = send( m_socket, sendbuf, strlen(sendbuf), 0 );
     printf( "Bytes Sent: %ld\n", bytesSent );

     printf("press a key to terminate\n");
     getch();

     return;
}




Sune B. Woeller wrote:
> Tim Peters wrote:
> 
>> It's starting to look a lot like the Windows bind() implementation is
>> unreliable, sometimes (but rarely -- hard to provoke) allowing two
>> sockets to bind to the same (address, port) pair simultaneously,
>> instead of raising 'Address already in use' for one of them.  Disaster
>> ensues.
>>
>> WRT the last version of the code I posted, on another XP Pro SP2
>> machine (again after playing registry games to boost the number of
>> ephemeral ports) I eventually saw all of:  hangs during accept(); the
>> assertion errors I mentioned last time; and mystery "Connection
>> refused" errors during connect().
>>
>> The variant of the code below _only_ tries to use port 19999.  If it
>> can't bind to that on the first try, socktest111() raises an exception
>> instead of trying again (or trying a different port number).  Ran two
>> processes.  After about 15 minutes, both died with assert errors at
>> about the same time (identical, so far as I could tell by eyeball):
>>
>> Process A:
>>
>> Traceback (most recent call last):
>>   File "socktest.py", line 209, in ?
>>     assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
>> AssertionError: ('292739', '821744', ('127.0.0.1', 19999), 
>> ('127.0.0.1', 3845))
>>
>> Process B:
>>
>> Traceback (most recent call last):
>>   File "socktest.py", line 209, in ?
>>     assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
>> AssertionError: ('821744', '292739', ('127.0.0.1', 19999), 
>> ('127.0.0.1', 3846))
>>
>> So it's again the business where each process is recv'ing the random
>> string intended to be recv'ed by a socket in the other process. 
>> Hypothesized timeline:
>>
>> process A's `a` binds to 19999
>> process B's `a` binds to 19999 -- according to me, this should be 
>> impossible
>>     in the absence of SO_REUSEADDR (which acts very differently on
>>     Windows than it does on Linux, BTW -- on Linux this should be 
>> impossible
>>     even in the presence of SO_REUSEADDR; regardless, we're not using
>>     SO_REUSEADDR here, and the braindead hard-coded
>>
>>         w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>
>>     is actually using the right magic constant for TCP_NODELAY on
>>     Windows, as it intends).
>> A and B both listen()
>> A connect()s, and accidentally gets on B.a's accept queue
>> B connect()s, and accidentally gets on A.a's accept queue
>> the rest follows inexorably
>>
> 
> 
> 
> This is what I'm experiencing as well.
> I can narrow it down a bit: I *always* experience one out of two
> erroneous behaviours, as described below.
> 
> I tried to make an even simpler test situation, without binding
> sockets 'r' and 'w' to each other in the same process. I try to
> reproduce the problem in a 'standard' socket use case, where a client
> in one process binds to a server in another process.
> 
> The following two scripts acts as a server and a client.
> 
> #***********************
> # sock_server_reader.py
> #***********************
> import socket
> 
> a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
> 
> a.bind(("127.0.0.1", 19999))
> print a.getsockname()  # assigned (host, port) pair
> 
> a.listen(1)
> 
> print "a accepting:"
> r, addr = a.accept()  # r becomes asyncore's (self.)socket
> print "a accepted: "
> print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername())
> 
> a.close()
> 
> msg = r.recv(100)
> print 'msg recieved:', msg
> 
> 
> #***********************
> # sock_client_writer.py
> #***********************
> import socket, random
> 
> w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
> 
> print 'w connecting:'
> w.connect(('127.0.0.1', 19999))
> print 'w connected:'
> print w.getsockname()
> print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername())
> msg = str(random.randrange(1000000))
> print 'sending msg: ', msg
> w.send(msg)
> 
> 
> 
> 
> There are two possible outcomes [a) and b)] of running two instances
> of this client/server pair (that is, 4 processes in total like the
> following).
> (Numbers 1 to 4 are steps executed in chronological order.)
> 
> 1) python -i sock_server_reader.py
> The server prints:
>     ('127.0.0.1', 19999)
>     a accepting:
> and waits for a connection
> 
> 2) python -i sock_client_writer.py
> The client prints:
>     w connecting:
>     w connected:
>     ('127.0.0.1', 3774)
>      ('127.0.0.1', 3774), peer=('127.0.0.1', 19999)
>     sending msg:  903848
>     >>>
> 
> and the server now accepts the connection and prints:
>     a accepted:
>      ('127.0.0.1', 19999), peer=('127.0.0.1', 3774)
>     msg recieved: 903848
>     >>>
> 
> This is like it should be. Then lets try to setup a second
> client/server pair, on the same port (19999). The expected outcome of
> this is that the bind() call in sock_server_reader.py should fail with
> socket.error: (10048, 'Address already in use').
> 
> 3) python -i sock_server_reader.py
> The server prints:
>     ('127.0.0.1', 19999)
>     a accepting:
> 
> Already here the problem occurs, bind() is allowed to bind to a port
> that is in use, in this case by the client socket 'r'.
> [also on other windows ? Mikkel: yes. Diku:???]
> 
> 4) python -i sock_client_writer.py
> Now one out of two things happen:
> 
> a) The client prints:
>     w connecting:
>     Traceback (most recent call last):
>       File "c:\pyscripts\sock_client_writer.py", line 7, in ?
>         w.connect(('127.0.0.1', 19999))
>       File "<string>", line 1, in connect
>     socket.error: (10061, 'Connection refused')
>     >>>
>    The server waits on the call to accept(), still waiting for a
> connection. (This is the blocking behaviour I reported in my first
> mail, experienced when running two zope instances. The socket error
> was swallowed by the unconditional except clause).
> 
> b) The client connects to the server:
>     w connecting:
>     w connected:
>     ('127.0.0.1', 3865)
>      ('127.0.0.1', 3865), peer=('127.0.0.1', 19999)
>     sending msg:  119105
>     >>>
> 
> and the server now accepts the connection and prints:
>     a accepted:
>      ('127.0.0.1', 19999), peer=('127.0.0.1', 3865)
>     msg recieved: 119105
>     >>>
> 
> The second set of client/server processes are now connected on the
> same port as the first set of client/server processes. In a port
> scanner the port now belongs two the second server process [3)].
> 
> 
> I always get one out of these two possibilities (a and b), I never
> see bind() raising socket.error: (10048, 'Address already in use').
> 
> It is important to realize that both these outcomes are an error.
> 
> I tried the same process as above on a linux system, and 3) always
> raises (10048, 'Address already in use').
> 
> 
> If case a) occured, where w.connect raises socket.error: (10061,
> 'Connection refused'), trying to run a third client/server pair, the
> bind() call raises (10048, 'Address already in use'). The 'a'-socket
> from the second pair of processes is not closed in this case, but
> still trying to accept().
> 
> In my case bind() always raises (10048, 'Address already in use') when
> there is an open server socket like 'a' bound to the same port.
> 
> To summarize:
> Closing a server socket bound to a given port, alows another server
> socket to bind to the same port, even when there are open client
> sockets bound to the port.
> 
> 
> 
> 
> 
>> Note that because this never tries a port number other than 19999, it
>> can't be a bulletproof workaround simply to hold on to the `a` socket.
>>  If the hypothesized timeline above is right, bind() can't be trusted
>> on Windows in any situation where two processes may try to bind to the
>> same hostname:port pair at the same time.  Holding on to `a`, and
>> cycling through port numbers when bind() failed, would still
>> potentially leave two processes trying to bind to the same port number
>> simultaneously (just a port other than 19999).
>>
> 
> It would not be enough to keep a reference to 'a'. It would have to be
> kept open as well. And maybe that is not a problem, since we only
> accept() once - only one 'w' client socket would be able to be
> accepted. Normally the use case for closing the server socket is to
> disallow more connections than those already acceptet.
> (But I'm not so experienced with sockets, I might be wrong.)
> 
> 
>> Ick:  this happens under Pythons 2.3.5 (MSVC 6) and 2.4.1 (MSVC 7.1),
>> so if it is -- as is looking more and more likely --an error in MS's
>> socket implementation, it isn't avoided by switching to a newer MS C
>> library.
>>
>> Frankly, I don't see a sane way to worm around this -- it's difficult
>> for application code to worm around what smells like a missing
>> critical section in system code.
>>
>> Using the simpler socket dance from the ZODB 3.4 code, I haven't yet
>> seen an instance of the assert failure, or a hang.  However, let two
>> processes run that long enough simultaneously, and it always (so far)
>> eventually fails with
>>
>>     socket.error: (10048, 'Address already in use')
>>
>> in the w.connect() call, and despite that Windows picks the port 
>> numbers here!
>>
> That is exactly what I feared could happen. As shown in my example
> above, the other that might happen is that the port is 'taken over' by
> the other process.
> 
> 
>> While that also smells to heaven of a missing critical section in the
>> Windows socket implementation, an exception is much easier to live
>> with / worm around.  Alas, we don't have the MS source code, and I
>> don't have time to try disassembling / reverse-engineering the opcodes
>> (what EULA <wink>?), so best I can do is run this for many more hours
>> to try to increase confidence that an exception is the worst that can
>> occur under the ZODB 3.4 spelling.
>>
>> Here's full code for the "only try port 19999" version:
>>
>> import socket, errno
>> import time, random
>> def socktest111():
>>     """Raise an exception if we can't get 19999.
>>     """
>>
>>     a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>>     w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>>
>>     # set TCP_NODELAY to true to avoid buffering
>>     w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>
>>     # tricky: get a pair of connected sockets
>>     host = '127.0.0.1'
>>     port = 19999
>>
>>     try:
>>         a.bind((host, port))
>>     except:
>>         raise RuntimeError
>>     else:
>>         print 'b',
>>
>>     a.listen (1)
>>     w.setblocking (0)
>>     try:
>>         w.connect ((host, port))
>>     except:
>>         pass
>>     print 'c',
>>     r, addr = a.accept()
>>     print 'a',
>>     a.close()
>>     print 'c',
>>     w.setblocking (1)
>>
>>     return (r, w)
>>
>> sofar = []
>> try:
>>    while 1:
>>        try:
>>            stuff = socktest111()
>>        except RuntimeError:
>>            print 'x',
>>            time.sleep(random.random()/10)
>>            continue
>>        sofar.append(stuff)
>>        time.sleep(random.random()/10)
>>        if len(sofar) == 50:
>>            tup = sofar.pop(0)
>>            r, w = tup
>>            msg = str(random.randrange(1000000))
>>            w.send(msg)
>>            msg2 = r.recv(100)
>>            assert msg == msg2, (msg, msg2, r.getsockname(), 
>> w.getsockname())
>>            for s in tup:
>>                s.close()
>> except KeyboardInterrupt:
>>    for tup in sofar:
>>        for s in tup:
>>            s.close()
>> _______________________________________________
>> Zope maillist  -  Zope at zope.org
>> http://mail.zope.org/mailman/listinfo/zope
>> **   No cross posts or HTML encoding!  **
>> (Related lists -  http://mail.zope.org/mailman/listinfo/zope-announce
>>  http://mail.zope.org/mailman/listinfo/zope-dev )
>>
> 
> _______________________________________________
> Zope maillist  -  Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
> http://mail.zope.org/mailman/listinfo/zope-dev )
> 



More information about the Zope mailing list