[Zope] Running more than one instance on windows often block each other

Tim Peters tim.peters at gmail.com
Wed Jul 27 14:31:15 EDT 2005


[Tim Peters]
...
> ....  Ran that loop in two processes.  No hangs, or any
> other oddities, for some minutes.  It did _eventually_ hang-- and both
> processes at the same time --with netstat showing more than 4000
> sockets hanging around in TIME_WAIT state then.  I assume I bashed
> into some internal Windows socket resource limit there, which Windows
> didn't handle gracefully.  Attaching to the processes under the MSVC 6
> debugger, they were hung inside the MS socket libraries.  Repeated
> this several times (everything appeared to work fine until > 4000
> sockets were sitting in TIME_WAIT, and then both processes hung at
> approximately the same time).

More info on that:  since WinXP Pro supplies only about 4000 ephemeral
ports by default, and the program kept hanging after about 4000
ephemeral ports were in use (albeit most in their 4-minute TIME_WAIT
shutdown state), I tried boosting the # of ephemeral ports:

    http://support.microsoft.com/kb/q196271

After that, I never saw the processes hang again.  BUT, I saw
something worse:  after about 20 minutes, both processes died with
assert errors, in the code I added to verify that the sockets were
communicating correctly.  The random string created in process A was
actually read by a socket in process B (instead of by its pair in
process A), and vice versa:  the random string created in process B
was read in process A, and at approximately the same time process B
was reading process A's string.

I tried it again, and got a pair of similar assert failures after
about 15 minutes.

That's dreadful, and I don't see how it could be anything except a
race bug in the Windows socket implementation.

The same program on Linux doesn't run long enough to say anything
interesting -- it raises "BindError, 'Cannot bind trigger!'" very
quickly every time, because it apparently keeps server port numbers
(19999, 19998, ,,,) reserved for "a long time" after the server socket
is closed (where "a long time" just means longer than the few seconds
it takes for the program to die on Linux).

All of the above is wrt using socktest1() below.

socktest2() below contains the Windows code I already changed ZODB 3.4
to use.  I've been running socktest2() in two processes that way on
Windows for more than 2 hours now, with no glitches.  The same code is
running fine on a Linux box too.

So best guess now is that there is a subtle, rare error in the Windows
socket code that could cause the Medusa/ZODB3.2 Windows trigger code
to screw up.

Complete code:

import socket, errno
import time, random

class BindError(Exception):
    pass


def socktest1():
    """blabla
    """

    address = ('127.9.9.9', 19999)

    a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
    w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)

    # set TCP_NODELAY to true to avoid buffering
    w.setsockopt(socket.IPPROTO_TCP, 1, 1)

    # tricky: get a pair of connected sockets
    host='127.0.0.1'
    port=19999

    while 1:
        if port < 19999:
            print port
        try:
            a.bind((host, port))
            break
        except:
            if port <= 19950:
                raise BindError, 'Cannot bind trigger!'
            port -= 1

    a.listen (1)
    w.setblocking (0)
    try:
        w.connect ((host, port))
    except:
        pass
    r, addr = a.accept()
    a.close()
    w.setblocking (1)

    #return (a, w, r)
    return (r, w)
    #return w

def socktest2():
    a = socket.socket()
    w = socket.socket()

    # set TCP_NODELAY to true to avoid buffering
    w.setsockopt(socket.IPPROTO_TCP, 1, 1)

    # Specifying port 0 tells Windows to pick a port for us.
    a.bind(("127.0.0.1", 0))
    connect_address = a.getsockname()  # assigned (host, port) pair
    a.listen(1)
    w.connect(connect_address)
    r, addr = a.accept()  # r becomes asyncore's (self.)socket
    a.close()

    #return (a, w, r)
    return (r, w)
    #return w

sofar = []
try:
   while 1:
       print '.',
       stuff = socktest1()
       sofar.append(stuff)
       time.sleep(random.random()/10)
       if len(sofar) == 50:
           tup = sofar.pop(0)
           r, w = tup
           msg = str(random.randrange(1000000))
           w.send(msg)
           msg2 = r.recv(100)
           assert msg == msg2, (msg, msg2)
           for s in tup:
               s.close()
except KeyboardInterrupt:
   for tup in sofar:
       for s in tup:
           s.close()


More information about the Zope mailing list