[Zope] Re: Running more than one instance on windows often block
each other
Sune B. Woeller
sune at syntetisk.dk
Thu Jul 28 10:09:54 EDT 2005
btw, the code is slightly modified versions of the getting started with Winsock
example:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/getting_started_with_winsock.asp
Sune B. Woeller wrote:
> I have made two similar testprograms in c++, and the problem also occurs
> there. Exactly the same pattern as my python client/server scripts in
> the mail I am replying to.
>
> But then I stumbled upon this flag in the WinSock documentation:
> SO_EXCLUSIVEADDRUSE
> See the description here:
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/using_so_exclusiveaddruse.asp
>
>
> It is very interesting reading, especially:
> "An important caveat to using the SO_EXCLUSIVEADDRUSE option exists: If
> one or more connections originating from (or accepted on) a port bound
> with SO_EXCLUSIVEADDRUSE is active, all bind attempts to that port will
> fail."
>
> This is just what we want (and I think that is standard behaviour on
> Linux).
>
> I have tested it with my c+ programs, and when i set that option on the
> server socket before the bind(), it works, bind() in the second server
> process fails with WSAEADDRINUSE
> (bind() failed: 10048.)
>
> There is a python bugfix for this, but only for python 2.4:
> http://sourceforge.net/tracker/index.php?func=detail&aid=982665&group_id=5470&atid=305470
>
> (It is added to version 1.294 of socketmodule.c)
>
>
> I run the two test programs from two cmd terminals, like I described for
> the python versions.
>
> // link with ws2_32.lib
> //sock_server.cpp
> #include <cstdlib>
> #include <stdio.h>
> #include <conio.h>
> #include "winsock2.h"
>
> void main() {
>
> // Initialize Winsock.
> WSADATA wsaData;
> int iResult = WSAStartup( MAKEWORD(2,2), &wsaData );
> if ( iResult != NO_ERROR )
> printf("Error at WSAStartup()\n");
>
> // Create a socket.
> SOCKET m_socket;
> m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );
>
> if ( m_socket == INVALID_SOCKET ) {
> printf( "Error at socket(): %ld\n", WSAGetLastError() );
> WSACleanup();
> return;
> }
>
> // try to use SO_EXCLUSIVEADDRUSE
> BOOL bOptVal = TRUE;
> int bOptLen = sizeof(BOOL);
> if (setsockopt(m_socket, SOL_SOCKET, SO_EXCLUSIVEADDRUSE,
> (char*)&bOptVal, bOptLen) != SOCKET_ERROR) {
> printf("Set SO_EXCLUSIVEADDRUSE: ON\n");
> }
>
> // Bind the socket.
> sockaddr_in service;
>
> service.sin_family = AF_INET;
> service.sin_addr.s_addr = inet_addr( "127.0.0.1" );
> service.sin_port = htons( 19990 );
>
> if ( bind( m_socket, (SOCKADDR*) &service, sizeof(service) ) ==
> SOCKET_ERROR ) {
> printf( "bind() failed: %i.\n", WSAGetLastError() );
> closesocket(m_socket);
> return;
> }
>
> // Listen on the socket.
> if ( listen( m_socket, 1 ) == SOCKET_ERROR )
> printf( "Error listening on socket.\n");
>
> // Accept connections.
> SOCKET AcceptSocket;
>
> printf( "Waiting for a client to connect...\n" );
> while (1) {
> AcceptSocket = SOCKET_ERROR;
> while ( AcceptSocket == SOCKET_ERROR ) {
> AcceptSocket = accept( m_socket, NULL, NULL );
> }
> printf( "Client Connected.\n");
> //m_socket = AcceptSocket;
> break;
> }
> closesocket(m_socket);
>
> // Send and receive data.
>
> int bytesRecv = SOCKET_ERROR;
>
> char recvbuf[32] = "";
> bytesRecv = recv( AcceptSocket, recvbuf, 32, 0 );
> printf( "Bytes Recv: %ld\n", bytesRecv );
> printf("Recieved: %s\n", recvbuf);
> printf("press a key to terminate\n");
> getch();
>
> return;
> }
>
> //sock_client.cpp
> #include <stdio.h>
> #include <conio.h>
> #include "winsock2.h"
>
> void main() {
>
> // Initialize Winsock.
> WSADATA wsaData;
> int iResult = WSAStartup( MAKEWORD(2,2), &wsaData );
> if ( iResult != NO_ERROR )
> printf("Error at WSAStartup()\n");
>
> // Create a socket.
> SOCKET m_socket;
> m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );
>
> if ( m_socket == INVALID_SOCKET ) {
> printf( "Error at socket(): %ld\n", WSAGetLastError() );
> WSACleanup();
> return;
> }
>
> // Connect to a server.
> sockaddr_in clientService;
>
> clientService.sin_family = AF_INET;
> clientService.sin_addr.s_addr = inet_addr( "127.0.0.1" );
> clientService.sin_port = htons( 19990 );
>
> if ( connect( m_socket, (SOCKADDR*) &clientService,
> sizeof(clientService) ) == SOCKET_ERROR) {
> printf( "Failed to connect.\n" );
> WSACleanup();
> return;
> }
>
> // Send and receive data.
> int bytesSent;
> char sendbuf[32] = "";
> printf("Enter string to send (max 30 bytes):\n");
> scanf("%s", sendbuf );
> printf("Sending: %s\n", sendbuf);
>
> bytesSent = send( m_socket, sendbuf, strlen(sendbuf), 0 );
> printf( "Bytes Sent: %ld\n", bytesSent );
>
> printf("press a key to terminate\n");
> getch();
>
> return;
> }
>
>
>
>
> Sune B. Woeller wrote:
>
>> Tim Peters wrote:
>>
>>> It's starting to look a lot like the Windows bind() implementation is
>>> unreliable, sometimes (but rarely -- hard to provoke) allowing two
>>> sockets to bind to the same (address, port) pair simultaneously,
>>> instead of raising 'Address already in use' for one of them. Disaster
>>> ensues.
>>>
>>> WRT the last version of the code I posted, on another XP Pro SP2
>>> machine (again after playing registry games to boost the number of
>>> ephemeral ports) I eventually saw all of: hangs during accept(); the
>>> assertion errors I mentioned last time; and mystery "Connection
>>> refused" errors during connect().
>>>
>>> The variant of the code below _only_ tries to use port 19999. If it
>>> can't bind to that on the first try, socktest111() raises an exception
>>> instead of trying again (or trying a different port number). Ran two
>>> processes. After about 15 minutes, both died with assert errors at
>>> about the same time (identical, so far as I could tell by eyeball):
>>>
>>> Process A:
>>>
>>> Traceback (most recent call last):
>>> File "socktest.py", line 209, in ?
>>> assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
>>> AssertionError: ('292739', '821744', ('127.0.0.1', 19999),
>>> ('127.0.0.1', 3845))
>>>
>>> Process B:
>>>
>>> Traceback (most recent call last):
>>> File "socktest.py", line 209, in ?
>>> assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
>>> AssertionError: ('821744', '292739', ('127.0.0.1', 19999),
>>> ('127.0.0.1', 3846))
>>>
>>> So it's again the business where each process is recv'ing the random
>>> string intended to be recv'ed by a socket in the other process.
>>> Hypothesized timeline:
>>>
>>> process A's `a` binds to 19999
>>> process B's `a` binds to 19999 -- according to me, this should be
>>> impossible
>>> in the absence of SO_REUSEADDR (which acts very differently on
>>> Windows than it does on Linux, BTW -- on Linux this should be
>>> impossible
>>> even in the presence of SO_REUSEADDR; regardless, we're not using
>>> SO_REUSEADDR here, and the braindead hard-coded
>>>
>>> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>>
>>> is actually using the right magic constant for TCP_NODELAY on
>>> Windows, as it intends).
>>> A and B both listen()
>>> A connect()s, and accidentally gets on B.a's accept queue
>>> B connect()s, and accidentally gets on A.a's accept queue
>>> the rest follows inexorably
>>>
>>
>>
>>
>> This is what I'm experiencing as well.
>> I can narrow it down a bit: I *always* experience one out of two
>> erroneous behaviours, as described below.
>>
>> I tried to make an even simpler test situation, without binding
>> sockets 'r' and 'w' to each other in the same process. I try to
>> reproduce the problem in a 'standard' socket use case, where a client
>> in one process binds to a server in another process.
>>
>> The following two scripts acts as a server and a client.
>>
>> #***********************
>> # sock_server_reader.py
>> #***********************
>> import socket
>>
>> a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>>
>> a.bind(("127.0.0.1", 19999))
>> print a.getsockname() # assigned (host, port) pair
>>
>> a.listen(1)
>>
>> print "a accepting:"
>> r, addr = a.accept() # r becomes asyncore's (self.)socket
>> print "a accepted: "
>> print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername())
>>
>> a.close()
>>
>> msg = r.recv(100)
>> print 'msg recieved:', msg
>>
>>
>> #***********************
>> # sock_client_writer.py
>> #***********************
>> import socket, random
>>
>> w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>
>> print 'w connecting:'
>> w.connect(('127.0.0.1', 19999))
>> print 'w connected:'
>> print w.getsockname()
>> print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername())
>> msg = str(random.randrange(1000000))
>> print 'sending msg: ', msg
>> w.send(msg)
>>
>>
>>
>>
>> There are two possible outcomes [a) and b)] of running two instances
>> of this client/server pair (that is, 4 processes in total like the
>> following).
>> (Numbers 1 to 4 are steps executed in chronological order.)
>>
>> 1) python -i sock_server_reader.py
>> The server prints:
>> ('127.0.0.1', 19999)
>> a accepting:
>> and waits for a connection
>>
>> 2) python -i sock_client_writer.py
>> The client prints:
>> w connecting:
>> w connected:
>> ('127.0.0.1', 3774)
>> ('127.0.0.1', 3774), peer=('127.0.0.1', 19999)
>> sending msg: 903848
>> >>>
>>
>> and the server now accepts the connection and prints:
>> a accepted:
>> ('127.0.0.1', 19999), peer=('127.0.0.1', 3774)
>> msg recieved: 903848
>> >>>
>>
>> This is like it should be. Then lets try to setup a second
>> client/server pair, on the same port (19999). The expected outcome of
>> this is that the bind() call in sock_server_reader.py should fail with
>> socket.error: (10048, 'Address already in use').
>>
>> 3) python -i sock_server_reader.py
>> The server prints:
>> ('127.0.0.1', 19999)
>> a accepting:
>>
>> Already here the problem occurs, bind() is allowed to bind to a port
>> that is in use, in this case by the client socket 'r'.
>> [also on other windows ? Mikkel: yes. Diku:???]
>>
>> 4) python -i sock_client_writer.py
>> Now one out of two things happen:
>>
>> a) The client prints:
>> w connecting:
>> Traceback (most recent call last):
>> File "c:\pyscripts\sock_client_writer.py", line 7, in ?
>> w.connect(('127.0.0.1', 19999))
>> File "<string>", line 1, in connect
>> socket.error: (10061, 'Connection refused')
>> >>>
>> The server waits on the call to accept(), still waiting for a
>> connection. (This is the blocking behaviour I reported in my first
>> mail, experienced when running two zope instances. The socket error
>> was swallowed by the unconditional except clause).
>>
>> b) The client connects to the server:
>> w connecting:
>> w connected:
>> ('127.0.0.1', 3865)
>> ('127.0.0.1', 3865), peer=('127.0.0.1', 19999)
>> sending msg: 119105
>> >>>
>>
>> and the server now accepts the connection and prints:
>> a accepted:
>> ('127.0.0.1', 19999), peer=('127.0.0.1', 3865)
>> msg recieved: 119105
>> >>>
>>
>> The second set of client/server processes are now connected on the
>> same port as the first set of client/server processes. In a port
>> scanner the port now belongs two the second server process [3)].
>>
>>
>> I always get one out of these two possibilities (a and b), I never
>> see bind() raising socket.error: (10048, 'Address already in use').
>>
>> It is important to realize that both these outcomes are an error.
>>
>> I tried the same process as above on a linux system, and 3) always
>> raises (10048, 'Address already in use').
>>
>>
>> If case a) occured, where w.connect raises socket.error: (10061,
>> 'Connection refused'), trying to run a third client/server pair, the
>> bind() call raises (10048, 'Address already in use'). The 'a'-socket
>> from the second pair of processes is not closed in this case, but
>> still trying to accept().
>>
>> In my case bind() always raises (10048, 'Address already in use') when
>> there is an open server socket like 'a' bound to the same port.
>>
>> To summarize:
>> Closing a server socket bound to a given port, alows another server
>> socket to bind to the same port, even when there are open client
>> sockets bound to the port.
>>
>>
>>
>>
>>
>>> Note that because this never tries a port number other than 19999, it
>>> can't be a bulletproof workaround simply to hold on to the `a` socket.
>>> If the hypothesized timeline above is right, bind() can't be trusted
>>> on Windows in any situation where two processes may try to bind to the
>>> same hostname:port pair at the same time. Holding on to `a`, and
>>> cycling through port numbers when bind() failed, would still
>>> potentially leave two processes trying to bind to the same port number
>>> simultaneously (just a port other than 19999).
>>>
>>
>> It would not be enough to keep a reference to 'a'. It would have to be
>> kept open as well. And maybe that is not a problem, since we only
>> accept() once - only one 'w' client socket would be able to be
>> accepted. Normally the use case for closing the server socket is to
>> disallow more connections than those already acceptet.
>> (But I'm not so experienced with sockets, I might be wrong.)
>>
>>
>>> Ick: this happens under Pythons 2.3.5 (MSVC 6) and 2.4.1 (MSVC 7.1),
>>> so if it is -- as is looking more and more likely --an error in MS's
>>> socket implementation, it isn't avoided by switching to a newer MS C
>>> library.
>>>
>>> Frankly, I don't see a sane way to worm around this -- it's difficult
>>> for application code to worm around what smells like a missing
>>> critical section in system code.
>>>
>>> Using the simpler socket dance from the ZODB 3.4 code, I haven't yet
>>> seen an instance of the assert failure, or a hang. However, let two
>>> processes run that long enough simultaneously, and it always (so far)
>>> eventually fails with
>>>
>>> socket.error: (10048, 'Address already in use')
>>>
>>> in the w.connect() call, and despite that Windows picks the port
>>> numbers here!
>>>
>> That is exactly what I feared could happen. As shown in my example
>> above, the other that might happen is that the port is 'taken over' by
>> the other process.
>>
>>
>>> While that also smells to heaven of a missing critical section in the
>>> Windows socket implementation, an exception is much easier to live
>>> with / worm around. Alas, we don't have the MS source code, and I
>>> don't have time to try disassembling / reverse-engineering the opcodes
>>> (what EULA <wink>?), so best I can do is run this for many more hours
>>> to try to increase confidence that an exception is the worst that can
>>> occur under the ZODB 3.4 spelling.
>>>
>>> Here's full code for the "only try port 19999" version:
>>>
>>> import socket, errno
>>> import time, random
>>> def socktest111():
>>> """Raise an exception if we can't get 19999.
>>> """
>>>
>>> a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>>> w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>>>
>>> # set TCP_NODELAY to true to avoid buffering
>>> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>>
>>> # tricky: get a pair of connected sockets
>>> host = '127.0.0.1'
>>> port = 19999
>>>
>>> try:
>>> a.bind((host, port))
>>> except:
>>> raise RuntimeError
>>> else:
>>> print 'b',
>>>
>>> a.listen (1)
>>> w.setblocking (0)
>>> try:
>>> w.connect ((host, port))
>>> except:
>>> pass
>>> print 'c',
>>> r, addr = a.accept()
>>> print 'a',
>>> a.close()
>>> print 'c',
>>> w.setblocking (1)
>>>
>>> return (r, w)
>>>
>>> sofar = []
>>> try:
>>> while 1:
>>> try:
>>> stuff = socktest111()
>>> except RuntimeError:
>>> print 'x',
>>> time.sleep(random.random()/10)
>>> continue
>>> sofar.append(stuff)
>>> time.sleep(random.random()/10)
>>> if len(sofar) == 50:
>>> tup = sofar.pop(0)
>>> r, w = tup
>>> msg = str(random.randrange(1000000))
>>> w.send(msg)
>>> msg2 = r.recv(100)
>>> assert msg == msg2, (msg, msg2, r.getsockname(),
>>> w.getsockname())
>>> for s in tup:
>>> s.close()
>>> except KeyboardInterrupt:
>>> for tup in sofar:
>>> for s in tup:
>>> s.close()
>>> _______________________________________________
>>> Zope maillist - Zope at zope.org
>>> http://mail.zope.org/mailman/listinfo/zope
>>> ** No cross posts or HTML encoding! **
>>> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
>>> http://mail.zope.org/mailman/listinfo/zope-dev )
>>>
>>
>> _______________________________________________
>> Zope maillist - Zope at zope.org
>> http://mail.zope.org/mailman/listinfo/zope
>> ** No cross posts or HTML encoding! **
>> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
>> http://mail.zope.org/mailman/listinfo/zope-dev )
>>
>
> _______________________________________________
> Zope maillist - Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> ** No cross posts or HTML encoding! **
> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
> http://mail.zope.org/mailman/listinfo/zope-dev )
>
More information about the Zope
mailing list