According to Tim Peters:
I don't know. Dieter asked whether you ran the tests via "zopectl test", but I didn't see an answer to that.
Ok, here some data points... bender:~/Zope-2.7.7-final$ cat /proc/version Linux version 2.6.9-11.ELsmp (bhcompile@decompose.build.redhat.com) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri May 20 18:26:27 EDT 2005 bender:~/Zope-2.7.7-final$ python2.3 Python 2.3.5 (#1, Apr 19 2005, 14:53:39) [GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Running one single test: bender:~/Zope-2.7.7-final$ python2.3 test.py testConnection checkNoVerificationOnServerRestart\$ Running unit tests from /home/wlang/Zope-2.7.7-final/lib/python ====================================================================== ERROR: checkNoVerificationOnServerRestart (ZEO.tests.testConnection.FileStorageReconnectionTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/wlang/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py", line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes ---------------------------------------------------------------------- Ran 1 test in 0.689s FAILED (errors=1) After some retries, the same test passes: bender:~/Zope-2.7.7-final$ python2.3 test.py testConnection checkNoVerificationOnServerRestart\$ Running unit tests from /home/wlang/Zope-2.7.7-final/lib/python ---------------------------------------------------------------------- Ran 1 test in 0.691s OK Interesstingly, if i run the test with strace, i never see the test fail (i tried at least 30 times): bender:~/Zope-2.7.7-final$ strace -e trace=signal -o /var/tmp/zeotest.trc python2.3 test.py testConnection checkNoVerificationOnServerRestart\$ Running unit tests from /home/wlang/Zope-2.7.7-final/lib/python ---------------------------------------------------------------------- Ran 1 test in 0.710s OK (Obviously a Heisenberg effect -- the observation influences the behaviour ;-) If anyone is interessted in the trace file -- it can be found at: http://slime.wu-wien.ac.at/misc/zeotest.trc (However, it would be way more interessting to see the syscalls while the test is failing...) Also, i debugged the whole test with the python debugger. Unfortunatly (as with strace), i was not able to reproduce the failing of the test in the debugger.
the ZEO tests spawn processes directly via Python's os.spawnve(), and later waits for them to end, via the waitpid() code shown earlier. It doesn't muck around with signals, forks, or anything else that should be platform-dependent (the same ZEO-test process code is used on both Linux and Windows, BTW -- for this reason, it can't rely on any fancy signal or process gimmicks; spawnve+watipid is the entire story here).
Yes, its as simple as that: zeo ist started, zeo is stopped, and when the parent calls waitpid, we get the "No child processes" error most of the time :-( Any ideas what we can try to narrow this down?
All the failures you showed were in test teardown. If that's all the failures you got, then all the test bodies actually passed. Of course you have to be wary that normal methods of detecting child-process termination aren't working as hoped on this box, because all the test failures you reported were exactly failures to detect child-process termination.
Sure -- we could just make this change: bender:.../ZEO/tests$ diff ConnectionTests.py.ori ConnectionTests.py 121c121,124 < os.waitpid(pid, 0) ---
try: os.waitpid(pid, 0) except OSError: pass
then all tests will pass. But then we will not know why the zeo zombie vanishes before the waitpid can reap the exit code ;-) \wlang{} PS: i'am afraid it turns out to be a python thread / signals / race problem -- yuck! -- Willi.Langenberger@wu-wien.ac.at Fax: +43/1/31336/9207 Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria