[Zope] Re: Running more than one instance on windows often block
each other
Sune B. Woeller
sune at syntetisk.dk
Thu Jul 28 10:05:23 EDT 2005
I have made two similar testprograms in c++, and the problem also occurs there.
Exactly the same pattern as my python client/server scripts in the mail I am
replying to.
But then I stumbled upon this flag in the WinSock documentation: SO_EXCLUSIVEADDRUSE
See the description here:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/using_so_exclusiveaddruse.asp
It is very interesting reading, especially:
"An important caveat to using the SO_EXCLUSIVEADDRUSE option exists: If one or
more connections originating from (or accepted on) a port bound with
SO_EXCLUSIVEADDRUSE is active, all bind attempts to that port will fail."
This is just what we want (and I think that is standard behaviour on Linux).
I have tested it with my c+ programs, and when i set that option on the server
socket before the bind(), it works, bind() in the second server process fails
with WSAEADDRINUSE
(bind() failed: 10048.)
There is a python bugfix for this, but only for python 2.4:
http://sourceforge.net/tracker/index.php?func=detail&aid=982665&group_id=5470&atid=305470
(It is added to version 1.294 of socketmodule.c)
I run the two test programs from two cmd terminals, like I described for the
python versions.
// link with ws2_32.lib
//sock_server.cpp
#include <cstdlib>
#include <stdio.h>
#include <conio.h>
#include "winsock2.h"
void main() {
// Initialize Winsock.
WSADATA wsaData;
int iResult = WSAStartup( MAKEWORD(2,2), &wsaData );
if ( iResult != NO_ERROR )
printf("Error at WSAStartup()\n");
// Create a socket.
SOCKET m_socket;
m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );
if ( m_socket == INVALID_SOCKET ) {
printf( "Error at socket(): %ld\n", WSAGetLastError() );
WSACleanup();
return;
}
// try to use SO_EXCLUSIVEADDRUSE
BOOL bOptVal = TRUE;
int bOptLen = sizeof(BOOL);
if (setsockopt(m_socket, SOL_SOCKET, SO_EXCLUSIVEADDRUSE, (char*)&bOptVal,
bOptLen) != SOCKET_ERROR) {
printf("Set SO_EXCLUSIVEADDRUSE: ON\n");
}
// Bind the socket.
sockaddr_in service;
service.sin_family = AF_INET;
service.sin_addr.s_addr = inet_addr( "127.0.0.1" );
service.sin_port = htons( 19990 );
if ( bind( m_socket, (SOCKADDR*) &service, sizeof(service) ) ==
SOCKET_ERROR ) {
printf( "bind() failed: %i.\n", WSAGetLastError() );
closesocket(m_socket);
return;
}
// Listen on the socket.
if ( listen( m_socket, 1 ) == SOCKET_ERROR )
printf( "Error listening on socket.\n");
// Accept connections.
SOCKET AcceptSocket;
printf( "Waiting for a client to connect...\n" );
while (1) {
AcceptSocket = SOCKET_ERROR;
while ( AcceptSocket == SOCKET_ERROR ) {
AcceptSocket = accept( m_socket, NULL, NULL );
}
printf( "Client Connected.\n");
//m_socket = AcceptSocket;
break;
}
closesocket(m_socket);
// Send and receive data.
int bytesRecv = SOCKET_ERROR;
char recvbuf[32] = "";
bytesRecv = recv( AcceptSocket, recvbuf, 32, 0 );
printf( "Bytes Recv: %ld\n", bytesRecv );
printf("Recieved: %s\n", recvbuf);
printf("press a key to terminate\n");
getch();
return;
}
//sock_client.cpp
#include <stdio.h>
#include <conio.h>
#include "winsock2.h"
void main() {
// Initialize Winsock.
WSADATA wsaData;
int iResult = WSAStartup( MAKEWORD(2,2), &wsaData );
if ( iResult != NO_ERROR )
printf("Error at WSAStartup()\n");
// Create a socket.
SOCKET m_socket;
m_socket = socket( AF_INET, SOCK_STREAM, IPPROTO_TCP );
if ( m_socket == INVALID_SOCKET ) {
printf( "Error at socket(): %ld\n", WSAGetLastError() );
WSACleanup();
return;
}
// Connect to a server.
sockaddr_in clientService;
clientService.sin_family = AF_INET;
clientService.sin_addr.s_addr = inet_addr( "127.0.0.1" );
clientService.sin_port = htons( 19990 );
if ( connect( m_socket, (SOCKADDR*) &clientService, sizeof(clientService) )
== SOCKET_ERROR) {
printf( "Failed to connect.\n" );
WSACleanup();
return;
}
// Send and receive data.
int bytesSent;
char sendbuf[32] = "";
printf("Enter string to send (max 30 bytes):\n");
scanf("%s", sendbuf );
printf("Sending: %s\n", sendbuf);
bytesSent = send( m_socket, sendbuf, strlen(sendbuf), 0 );
printf( "Bytes Sent: %ld\n", bytesSent );
printf("press a key to terminate\n");
getch();
return;
}
Sune B. Woeller wrote:
> Tim Peters wrote:
>
>> It's starting to look a lot like the Windows bind() implementation is
>> unreliable, sometimes (but rarely -- hard to provoke) allowing two
>> sockets to bind to the same (address, port) pair simultaneously,
>> instead of raising 'Address already in use' for one of them. Disaster
>> ensues.
>>
>> WRT the last version of the code I posted, on another XP Pro SP2
>> machine (again after playing registry games to boost the number of
>> ephemeral ports) I eventually saw all of: hangs during accept(); the
>> assertion errors I mentioned last time; and mystery "Connection
>> refused" errors during connect().
>>
>> The variant of the code below _only_ tries to use port 19999. If it
>> can't bind to that on the first try, socktest111() raises an exception
>> instead of trying again (or trying a different port number). Ran two
>> processes. After about 15 minutes, both died with assert errors at
>> about the same time (identical, so far as I could tell by eyeball):
>>
>> Process A:
>>
>> Traceback (most recent call last):
>> File "socktest.py", line 209, in ?
>> assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
>> AssertionError: ('292739', '821744', ('127.0.0.1', 19999),
>> ('127.0.0.1', 3845))
>>
>> Process B:
>>
>> Traceback (most recent call last):
>> File "socktest.py", line 209, in ?
>> assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
>> AssertionError: ('821744', '292739', ('127.0.0.1', 19999),
>> ('127.0.0.1', 3846))
>>
>> So it's again the business where each process is recv'ing the random
>> string intended to be recv'ed by a socket in the other process.
>> Hypothesized timeline:
>>
>> process A's `a` binds to 19999
>> process B's `a` binds to 19999 -- according to me, this should be
>> impossible
>> in the absence of SO_REUSEADDR (which acts very differently on
>> Windows than it does on Linux, BTW -- on Linux this should be
>> impossible
>> even in the presence of SO_REUSEADDR; regardless, we're not using
>> SO_REUSEADDR here, and the braindead hard-coded
>>
>> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>
>> is actually using the right magic constant for TCP_NODELAY on
>> Windows, as it intends).
>> A and B both listen()
>> A connect()s, and accidentally gets on B.a's accept queue
>> B connect()s, and accidentally gets on A.a's accept queue
>> the rest follows inexorably
>>
>
>
>
> This is what I'm experiencing as well.
> I can narrow it down a bit: I *always* experience one out of two
> erroneous behaviours, as described below.
>
> I tried to make an even simpler test situation, without binding
> sockets 'r' and 'w' to each other in the same process. I try to
> reproduce the problem in a 'standard' socket use case, where a client
> in one process binds to a server in another process.
>
> The following two scripts acts as a server and a client.
>
> #***********************
> # sock_server_reader.py
> #***********************
> import socket
>
> a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>
> a.bind(("127.0.0.1", 19999))
> print a.getsockname() # assigned (host, port) pair
>
> a.listen(1)
>
> print "a accepting:"
> r, addr = a.accept() # r becomes asyncore's (self.)socket
> print "a accepted: "
> print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername())
>
> a.close()
>
> msg = r.recv(100)
> print 'msg recieved:', msg
>
>
> #***********************
> # sock_client_writer.py
> #***********************
> import socket, random
>
> w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>
> print 'w connecting:'
> w.connect(('127.0.0.1', 19999))
> print 'w connected:'
> print w.getsockname()
> print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername())
> msg = str(random.randrange(1000000))
> print 'sending msg: ', msg
> w.send(msg)
>
>
>
>
> There are two possible outcomes [a) and b)] of running two instances
> of this client/server pair (that is, 4 processes in total like the
> following).
> (Numbers 1 to 4 are steps executed in chronological order.)
>
> 1) python -i sock_server_reader.py
> The server prints:
> ('127.0.0.1', 19999)
> a accepting:
> and waits for a connection
>
> 2) python -i sock_client_writer.py
> The client prints:
> w connecting:
> w connected:
> ('127.0.0.1', 3774)
> ('127.0.0.1', 3774), peer=('127.0.0.1', 19999)
> sending msg: 903848
> >>>
>
> and the server now accepts the connection and prints:
> a accepted:
> ('127.0.0.1', 19999), peer=('127.0.0.1', 3774)
> msg recieved: 903848
> >>>
>
> This is like it should be. Then lets try to setup a second
> client/server pair, on the same port (19999). The expected outcome of
> this is that the bind() call in sock_server_reader.py should fail with
> socket.error: (10048, 'Address already in use').
>
> 3) python -i sock_server_reader.py
> The server prints:
> ('127.0.0.1', 19999)
> a accepting:
>
> Already here the problem occurs, bind() is allowed to bind to a port
> that is in use, in this case by the client socket 'r'.
> [also on other windows ? Mikkel: yes. Diku:???]
>
> 4) python -i sock_client_writer.py
> Now one out of two things happen:
>
> a) The client prints:
> w connecting:
> Traceback (most recent call last):
> File "c:\pyscripts\sock_client_writer.py", line 7, in ?
> w.connect(('127.0.0.1', 19999))
> File "<string>", line 1, in connect
> socket.error: (10061, 'Connection refused')
> >>>
> The server waits on the call to accept(), still waiting for a
> connection. (This is the blocking behaviour I reported in my first
> mail, experienced when running two zope instances. The socket error
> was swallowed by the unconditional except clause).
>
> b) The client connects to the server:
> w connecting:
> w connected:
> ('127.0.0.1', 3865)
> ('127.0.0.1', 3865), peer=('127.0.0.1', 19999)
> sending msg: 119105
> >>>
>
> and the server now accepts the connection and prints:
> a accepted:
> ('127.0.0.1', 19999), peer=('127.0.0.1', 3865)
> msg recieved: 119105
> >>>
>
> The second set of client/server processes are now connected on the
> same port as the first set of client/server processes. In a port
> scanner the port now belongs two the second server process [3)].
>
>
> I always get one out of these two possibilities (a and b), I never
> see bind() raising socket.error: (10048, 'Address already in use').
>
> It is important to realize that both these outcomes are an error.
>
> I tried the same process as above on a linux system, and 3) always
> raises (10048, 'Address already in use').
>
>
> If case a) occured, where w.connect raises socket.error: (10061,
> 'Connection refused'), trying to run a third client/server pair, the
> bind() call raises (10048, 'Address already in use'). The 'a'-socket
> from the second pair of processes is not closed in this case, but
> still trying to accept().
>
> In my case bind() always raises (10048, 'Address already in use') when
> there is an open server socket like 'a' bound to the same port.
>
> To summarize:
> Closing a server socket bound to a given port, alows another server
> socket to bind to the same port, even when there are open client
> sockets bound to the port.
>
>
>
>
>
>> Note that because this never tries a port number other than 19999, it
>> can't be a bulletproof workaround simply to hold on to the `a` socket.
>> If the hypothesized timeline above is right, bind() can't be trusted
>> on Windows in any situation where two processes may try to bind to the
>> same hostname:port pair at the same time. Holding on to `a`, and
>> cycling through port numbers when bind() failed, would still
>> potentially leave two processes trying to bind to the same port number
>> simultaneously (just a port other than 19999).
>>
>
> It would not be enough to keep a reference to 'a'. It would have to be
> kept open as well. And maybe that is not a problem, since we only
> accept() once - only one 'w' client socket would be able to be
> accepted. Normally the use case for closing the server socket is to
> disallow more connections than those already acceptet.
> (But I'm not so experienced with sockets, I might be wrong.)
>
>
>> Ick: this happens under Pythons 2.3.5 (MSVC 6) and 2.4.1 (MSVC 7.1),
>> so if it is -- as is looking more and more likely --an error in MS's
>> socket implementation, it isn't avoided by switching to a newer MS C
>> library.
>>
>> Frankly, I don't see a sane way to worm around this -- it's difficult
>> for application code to worm around what smells like a missing
>> critical section in system code.
>>
>> Using the simpler socket dance from the ZODB 3.4 code, I haven't yet
>> seen an instance of the assert failure, or a hang. However, let two
>> processes run that long enough simultaneously, and it always (so far)
>> eventually fails with
>>
>> socket.error: (10048, 'Address already in use')
>>
>> in the w.connect() call, and despite that Windows picks the port
>> numbers here!
>>
> That is exactly what I feared could happen. As shown in my example
> above, the other that might happen is that the port is 'taken over' by
> the other process.
>
>
>> While that also smells to heaven of a missing critical section in the
>> Windows socket implementation, an exception is much easier to live
>> with / worm around. Alas, we don't have the MS source code, and I
>> don't have time to try disassembling / reverse-engineering the opcodes
>> (what EULA <wink>?), so best I can do is run this for many more hours
>> to try to increase confidence that an exception is the worst that can
>> occur under the ZODB 3.4 spelling.
>>
>> Here's full code for the "only try port 19999" version:
>>
>> import socket, errno
>> import time, random
>> def socktest111():
>> """Raise an exception if we can't get 19999.
>> """
>>
>> a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>> w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
>>
>> # set TCP_NODELAY to true to avoid buffering
>> w.setsockopt(socket.IPPROTO_TCP, 1, 1)
>>
>> # tricky: get a pair of connected sockets
>> host = '127.0.0.1'
>> port = 19999
>>
>> try:
>> a.bind((host, port))
>> except:
>> raise RuntimeError
>> else:
>> print 'b',
>>
>> a.listen (1)
>> w.setblocking (0)
>> try:
>> w.connect ((host, port))
>> except:
>> pass
>> print 'c',
>> r, addr = a.accept()
>> print 'a',
>> a.close()
>> print 'c',
>> w.setblocking (1)
>>
>> return (r, w)
>>
>> sofar = []
>> try:
>> while 1:
>> try:
>> stuff = socktest111()
>> except RuntimeError:
>> print 'x',
>> time.sleep(random.random()/10)
>> continue
>> sofar.append(stuff)
>> time.sleep(random.random()/10)
>> if len(sofar) == 50:
>> tup = sofar.pop(0)
>> r, w = tup
>> msg = str(random.randrange(1000000))
>> w.send(msg)
>> msg2 = r.recv(100)
>> assert msg == msg2, (msg, msg2, r.getsockname(),
>> w.getsockname())
>> for s in tup:
>> s.close()
>> except KeyboardInterrupt:
>> for tup in sofar:
>> for s in tup:
>> s.close()
>> _______________________________________________
>> Zope maillist - Zope at zope.org
>> http://mail.zope.org/mailman/listinfo/zope
>> ** No cross posts or HTML encoding! **
>> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
>> http://mail.zope.org/mailman/listinfo/zope-dev )
>>
>
> _______________________________________________
> Zope maillist - Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> ** No cross posts or HTML encoding! **
> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
> http://mail.zope.org/mailman/listinfo/zope-dev )
>
More information about the Zope
mailing list