[Zope-dev] [Warning] Zope/ZEO clients: subprocesses can lead
tonon-deterministic message loss
Dieter Maurer
dieter at handshake.de
Tue Jun 29 14:48:24 EDT 2004
Tim Peters wrote at 2004-6-29 00:31 -0400:
> ...
[Dieter]
>> Meanwhile, I checked that "fork" under Linux with LinuxThreads
>> behaves with respect to threads as dictated by the POSIX
>> standard: the forked process has a single thread and
>> does not inherit other threads from its parent.
>>
>> I will soon check how our Solaris version of Python behaves.
>> If this, too, has only one thread, I will apologize for
>> the premature warning...
The same is true for Python 2.3.3 generated on Solaris
without specification of any specific thread library.
It may however be that "gcc" has a preference for the "pthreads"
library.
> ...
[Dieter]
>> The ZEO client has the basic structure:
>>
>> while 1:
>> work_to_do = get_work(...)
>> for work in work_to_do:
>> pid = fork()
>> if pid == 0:
>> do_work(work)
>> # will not return
>> sleep(...)
>>
>> "do_work" opens a new ZEO connection.
>> "get_work" and "do_work" use "asyncore.poll" to synchronize with incoming
>> messages from ZEO -- no "asyncore.mainloop" around.
>>
>> The "poll" in "do_work" has stolen ZEO invalidation messages
>> destined for the parent such that "get_work" has read old state
>> and returned work items already completed. That is the problem
>> I saw.
>
>Well, don't do that then <wink>.
>
>> All this is easy to understand, (almost) platform independent
>> and independant of the thread library.
>
>I still wouldn't say it's easy to understand. While the thread that
>calls fork isn't running an asyncore loop, it must still be the case
>that asyncore in the parent has a non-empty map -- yes?
Sure, set up by "ZEO.ClientStorage" to receive invalidation
messages in anticipation that
either a "mainloop" runs or someone calls "poll".
>If it had an
>empty map, the child processes would start with a clean slate (map),
>and so wouldn't pick up socket traffic meant for the parent.
>
>If that's so, it looks like just clearing asyncore's map in the child
>(before do_work()) would solve the (main) problem.
That indeed was the solution for my immediate problem...
However, I did not only think at my own problem but also
what damage this could make for others. And I had read
this imprecise "fork" manual page -- which suggested
danger for all forking applications with "mainloop"...
Thanks to your replies, I now have bookmarked these
excellent "opengroup.org" specifications. I will consult
them when I will have the next POSIX semantics question...
--
Dieter
More information about the Zope-Dev
mailing list