Running more than one instance on windows often block each other
Hi list, I run a few zope instances on a windows machine (xp). I start them either with runzope.bat or as services. The behaviour below is independent of this. But sometimes a running instance (A) seems to block another instance (B) from starting. This does not happen every time. If I stop instance A and start B again, it runs fine. The funny thing is that sometimes B starts fine even with a running A. (Or vice versa). Of course I set them to run on different port numbers in zope.conf. (HTTP, FTP etc.) I can see (with the excellent (and free) 'Process Explorer' from sysinternals) that the python processes always opens port 19999, and connects by that port to themselves on another port (for instance 2550). Maybe my problem has something to do with this? Has anybody experienced the same behaviour? greetings, Sune
Sune B. Woeller wrote at 2005-7-21 13:16 +0200:
... I can see (with the excellent (and free) 'Process Explorer' from sysinternals) that the python processes always opens port 19999, and connects by that port to themselves on another port (for instance 2550).
You can find the relevant code in "ZServer.medusa.thread.select_trigger.trigger.__init__" In principle, the code should try all sockets between "19999" down to "19950" and fail only when none of them could be bound to... -- Dieter
[Sune B. Woeller]
... I can see (with the excellent (and free) 'Process Explorer' from sysinternals) that the python processes always opens port 19999, and connects by that port to themselves on another port (for instance 2550).
[Dieter Maurer]
You can find the relevant code in "ZServer.medusa.thread.select_trigger.trigger.__init__"
In principle, the code should try all sockets between "19999" down to "19950" and fail only when none of them could be bound to...
Yup. ZODB has what looks like a copy/paste of this code, in ZEO/zrpc/trigger.py. I didn't realize where it came from originally until you pointed out the Medusa code here. Anyway, it so happens I rewrote ZEO's copy a few weeks ago, in ZODB 3.4. The Windows part is much simpler there now. I don't know why the original might fail in the way Sune reported, but perhaps the rewritten version would not. Before: # tricky: get a pair of connected sockets host='127.0.0.1' port=19999 while 1: try: self.address=(host, port) a.bind(self.address) break except: if port <= 19950: raise BindError, 'Cannot bind trigger!' port=port - 1 a.listen (1) w.setblocking (0) try: w.connect (self.address) except: pass r, addr = a.accept() a.close() w.setblocking (1) self.trigger = w After: # Specifying port 0 tells Windows to pick a port for us. a.bind(("127.0.0.1", 0)) connect_address = a.getsockname() # assigned (host, port) pair a.listen(1) w.connect(connect_address) r, addr = a.accept() # r becomes asyncore's (self.)socket a.close() self.trigger = w
Tim Peters wrote at 2005-7-25 10:36 -0400:
Yup. ZODB has what looks like a copy/paste of this code, in ZEO/zrpc/trigger.py. I didn't realize where it came from originally until you pointed out the Medusa code here.
Anyway, it so happens I rewrote ZEO's copy a few weeks ago, in ZODB 3.4. The Windows part is much simpler there now. .... After:
# Specifying port 0 tells Windows to pick a port for us. a.bind(("127.0.0.1", 0)) connect_address = a.getsockname() # assigned (host, port) pair a.listen(1) w.connect(connect_address) r, addr = a.accept() # r becomes asyncore's (self.)socket a.close() self.trigger = w
This may even be portable (not Windows specific). At least, it works for Linux2. In this case, we might get rid of the stupid code duplication... -- Dieter
[Tim Peters]
Yup. ZODB has what looks like a copy/paste of this code, in ZEO/zrpc/trigger.py. I didn't realize where it came from originally until you pointed out the Medusa code here.
Anyway, it so happens I rewrote ZEO's copy a few weeks ago, in ZODB 3.4. The Windows part is much simpler there now. .... After:
# Specifying port 0 tells Windows to pick a port for us. a.bind(("127.0.0.1", 0)) connect_address = a.getsockname() # assigned (host, port) pair a.listen(1) w.connect(connect_address) r, addr = a.accept() # r becomes asyncore's (self.)socket a.close() self.trigger = w
[Dieter Maurer]
This may even be portable (not Windows specific). At least, it works for Linux2.
I believe it is portable, but the Unix version of this code doesn't use sockets at all. It uses a pipe instead. A pipe can't be used on Windows because the Windows select() works only with sockets, and asyncore on Windows uses select(). I don't know if/why a pipe would be better on Unix, but just assume that it is. I do know that the Windows version of this code used to leak sockets madly, for years. The pipe code is simpler still.
In this case, we might get rid of the stupid code duplication...
Well, there are two kinds: 1. Massive code duplication between the "posix" and "not posix" versions of the `trigger` classes. I already refactored ZODB's copy to eliminate that (most of the ZODB 3.4 trigger code is in a shared base class now, and the "posix" and "not posix" versions override just enough to make the pipe-versus-socket-pair distinction). 2. Massive code duplication between ZODB's copy and Medusa's. Hmm. Since I refactored ZODB's copy, it's hard to tell that they have anything in common anymore ;-)
Tim Peters wrote:
[Sune B. Woeller]
... I can see (with the excellent (and free) 'Process Explorer' from sysinternals) that the python processes always opens port 19999, and connects by that port to themselves on another port (for instance 2550).
[Dieter Maurer]
You can find the relevant code in "ZServer.medusa.thread.select_trigger.trigger.__init__"
In principle, the code should try all sockets between "19999" down to "19950" and fail only when none of them could be bound to...
Thanks for the pointer. I have been debugging select_trigger.py, and has some more info: The problem is that the call a.accept() sometimes hangs. Apparently a.bind(self.address) allows us to bind to a port that another zope instance already is bound to. The code creates the server socket a, and the client socket w, and gets the client socket r by connecting w to a. Then it closes a. a goes out of scope when __init__ terminates, and is probably garbage collected at some point. I tried moving the code to the following standalone script, and I can reproduce the error with that. In the original code w is kept as an instance variable, and r is passed to asyncore.dispatcher.__init__ and probably kept there. I simulate that by returning them, then the caller of socktest can keep them around. I try to call socktest from different processes A and B (two pythons): (w,r = socktest()) The call in A gets port 19999. The second call, in B, either blocks, or takes over port 19999 (I see the second process taking over the port in a port scanner.) a.bind in B does not raise socket.error: (10048, 'Address already in use') as expected, when the server socket in A is closed, even though the port is used by the client socket r in A. If I remove a.close(), and keep a around (by passing it to the caller), a.bind works as expected - it raises socket.error: (10048, 'Address already in use'). But in the litterature on sockets, I read it should be okay to close the server socket and keep using the client sockets. So, is this a possible bug in bind() ? I have tested the new code from Tim Peters, it apparently works, ports are given out by windows. But could the same problem with bind occur here, since a is closed (and garbage collected) ? (far less chance for that since we do not specify port numbers, I know). I tried getting a pair of sockets with Tim's code, and then trying to bind a third socket to the same port as a/r. And I got the same problem as above. Sune +++++++++++++++++++++ import socket, errno class BindError(Exception): pass def socktest(): """blabla """ address = ('127.9.9.9', 19999) a = socket.socket (socket.AF_INET, socket.SOCK_STREAM) w = socket.socket (socket.AF_INET, socket.SOCK_STREAM) # set TCP_NODELAY to true to avoid buffering w.setsockopt(socket.IPPROTO_TCP, 1, 1) # tricky: get a pair of connected sockets host='127.0.0.1' port=19999 while 1: print port try: a.bind((host, port)) break except: if port <= 19950: raise BindError, 'Cannot bind trigger!' port=port - 1 a.listen (1) w.setblocking (0) try: w.connect ((host, port)) except: pass r, addr = a.accept() a.close() w.setblocking (1) #return (a, w, r) return (w, r) #return w +++++++++++++++++++++
Yup. ZODB has what looks like a copy/paste of this code, in ZEO/zrpc/trigger.py. I didn't realize where it came from originally until you pointed out the Medusa code here.
Anyway, it so happens I rewrote ZEO's copy a few weeks ago, in ZODB 3.4. The Windows part is much simpler there now. I don't know why the original might fail in the way Sune reported, but perhaps the rewritten version would not.
Before:
# tricky: get a pair of connected sockets host='127.0.0.1' port=19999 while 1: try: self.address=(host, port) a.bind(self.address) break except: if port <= 19950: raise BindError, 'Cannot bind trigger!' port=port - 1
a.listen (1) w.setblocking (0) try: w.connect (self.address) except: pass r, addr = a.accept() a.close() w.setblocking (1) self.trigger = w
After:
# Specifying port 0 tells Windows to pick a port for us. a.bind(("127.0.0.1", 0)) connect_address = a.getsockname() # assigned (host, port) pair a.listen(1) w.connect(connect_address) r, addr = a.accept() # r becomes asyncore's (self.)socket a.close() self.trigger = w _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
participants (3)
-
Dieter Maurer -
Sune B. Woeller -
Tim Peters