[Warning] Zope/ZEO clients: subprocesses can lead to non-deterministic message loss
ATTENTION: Crosspost -- Reply-To set to 'zope-dev@zope.org' Today, I hit a nasty error. The error affects applications under Unix (and maybe Windows) which * use an "asyncore" mainloop thread (and maybe other asyncore applications) Zope and many ZEO clients belong to this class and * create subprocesses (via "fork" and "system", "popen" or friends if they use "fork" internally (they do under Unix but I think not under Windows)). The error can cause non-deterministic loss of messages (HTTP requests, ZEO server responses, ...) destined for the parent process. It also can cause the same output to be send several times over sockets. The error is explained as follows: "asyncore" maintains a map from file descriptors to handlers. The "asyncore" main loop waits for any file descriptor to become "active" and then calls the corresponding handler. When a process forks the complete state, including file descriptors, threads and memory state is copied and the new process executes in this copied state. We now have 2 "asyncore" threads waiting for the same events. File descriptors are shared between parent and child. When the child reads from a file descriptor from its parent, it steals the corresponding message: the message will not reach the parent. While file descriptors are shared, memory state is separate. Therefore, pending writes can be performed by both parent and child -- leading to duplicate writes to the same file descriptor. A workaround it to deactivate "asyncore" before forking (or "system", "popen", ...) and reactivate it afterwards: as exemplified in the following code: from asyncore import socket_map saved_socket_map = socket_map.copy() socket_map.clear() # deactivate "asyncore" pid = None try: pid = fork() if (pid == 0): # child # ... finally: if pid != 0: socket_map.update(saved_socket_map) # reactivate "asyncore" -- Dieter
On Fri, Jun 25, 2004 at 07:23:19PM +0200, Dieter Maurer wrote:
ATTENTION: Crosspost -- Reply-To set to 'zope-dev@zope.org'
Today, I hit a nasty error.
The error affects applications under Unix (and maybe Windows) which
* use an "asyncore" mainloop thread (and maybe other asyncore applications)
Zope and many ZEO clients belong to this class
and
* create subprocesses (via "fork" and "system", "popen" or friends if they use "fork" internally (they do under Unix but I think not under Windows)).
Hm. this applies to external methods and product code that makes these calls? -- Paul Winkler http://www.slinkp.com
ATTENTION: Crosspost -- Reply-To set to 'zope-dev@zope.org' On Friday, I reported a bug that can cause non-deterministic message loss and duplication of messages in forking applications with an "asyncore" mainloop thread. Unfortunately, the proposed workararound does not work for various reasons (as you may already have recognized and reported): * it modifies global variables without protection which is a receipe for desaster in a multi-threaded environment * it resets state that is already activated. Therefore, it is not effective in preventing the main problem. * when applied for "system", it blocks Zope until the the call returns which may be far too long. I am working at another work around. The main ideas is: * Inform "asyncore" about the actor for which its mainloop should execute. * When the actor is set, "asyncore" calls handlers only for this actor and does nothing otherwise. The problem: what is an "actor"? The most natural choice would be the process, identified via its process id. However, under Linux, the process id may not identify the process but the thread. I am still looking for an adequate, platform independent "actor" definition. -- Dieter
participants (2)
-
Dieter Maurer -
Paul Winkler