I have two zope servers in a cluster running Zope 2.7.0. We moved to 2.7.0 a few weeks ago, and have had this problem ever since. At least once a day, and sometimes much more frequently, they fall over, sometimes one, sometimes both. I assume it is related to load, because there are no other patterns I can see. When the Zope servers are running I see something like the following from ps: UID PID PPID C STIME TTY TIME CMD zope 28306 1 0 Apr21 ? 00:00:08 /usr/bin/python2.3 /usr/local/zope2/lib/python/zdaemon/zdrun.py -S /usr/local/zope2/lib/python/ZEO/zeoctl.xml -C /usr/local/be/zopeinstance7.icpeurope.net/config/zeo.conf /usr/ zope 28307 28306 1 Apr21 ? 05:30:30 /usr/bin/python2.3 /usr/local/zope2/lib/python/ZEO/runzeo.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zeo.conf zope 835 28307 0 Apr22 ? 00:00:06 /usr/bin/python2.3 /usr/local/zope2/lib/python/ZEO/runzeo.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zeo.conf zope 28621 1 0 10:21 ? 00:00:00 /usr/bin/python2.3 /usr/local/zope2/lib/python/zdaemon/zdrun.py -S /usr/local/zope2/lib/python/Zope/Startup/zopeschema.xml -b 30 -d -f -s /usr/local/be/zopeinstance7.icpeurope. zope 28622 28621 13 10:21 ? 00:00:03 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28631 28622 0 10:21 ? 00:00:00 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28635 28631 4 10:22 ? 00:00:00 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28636 28631 12 10:22 ? 00:00:02 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28637 28631 15 10:22 ? 00:00:02 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28638 28631 10 10:22 ? 00:00:01 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf After it has fallen over, I see: UID PID PPID C STIME TTY TIME CMD zope 28306 1 0 Apr21 ? 00:00:09 /usr/bin/python2.3 /usr/local/zope2/lib/python/zdaemon/zdrun.py -S /usr/local/zope2/lib/python/ZEO/zeoctl.xml -C /usr/local/be/zopeinstance7.icpeurope.net/config/zeo.conf /usr/ zope 28307 28306 1 Apr21 ? 05:33:47 /usr/bin/python2.3 /usr/local/zope2/lib/python/ZEO/runzeo.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zeo.conf zope 835 28307 0 Apr22 ? 00:00:06 /usr/bin/python2.3 /usr/local/zope2/lib/python/ZEO/runzeo.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zeo.conf zope 28621 1 0 10:21 ? 00:00:00 /usr/bin/python2.3 /usr/local/zope2/lib/python/zdaemon/zdrun.py -S /usr/local/zope2/lib/python/Zope/Startup/zopeschema.xml -b 30 -d -f -s /usr/local/be/zopeinstance7.icpeurope. zope 28636 1 6 10:22 ? 00:05:26 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28637 1 5 10:22 ? 00:05:12 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28638 1 5 10:22 ? 00:04:54 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf It looks like, in this case, PID 28631 has died, leaving orphan zope processes. However, zdrun hasn't noticed this and restarted, presumably because these orphans have been left. Has anyone got any ideas what I can try to debug/fix this problem? Cheers, Doug. -- 020 79610341 / 07879 423002 / dwinter@icpeurope.net 3 Waterhouse Square, Holborn Bars, 142 Holborn, London EC1N 2NX www.businesseurope.com www.icpeurope.net www.venturedome.com 1024D/1AB26B8C C88E DC6D A578 DEFB C493 A44D 0156 0479 1AB2 6B8C
Doug Winter wrote at 2004-5-5 11:57 +0100:
... At least once a day, and sometimes much more frequently, they fall over, sometimes one, sometimes both.
You Zope died, probably due to a SIGSEGV. Change your "ulimit -c" setting to allow core files to be written. You will then get core files for these situations. GDB may be able to analyse them (unfortunately, either GDB can not analyse core files for multi-threaded programs or Linux writes them wrong; at least you should see the death course and where it happened).
... After it has fallen over, I see:
UID PID PPID C STIME TTY TIME CMD ... zope 28621 1 0 10:21 ? 00:00:00 /usr/bin/python2.3 /usr/local/zope2/lib/python/zdaemon/zdrun.py -S /usr/local/zope2/lib/python/Zope/Startup/zopeschema.xml -b 30 -d -f -s /usr/local/be/zopeinstance7.icpeurope. zope 28636 1 6 10:22 ? 00:05:26 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28637 1 5 10:22 ? 00:05:12 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf zope 28638 1 5 10:22 ? 00:04:54 /usr/bin/python2.3 /usr/local/zope2/lib/python/Zope/Startup/run.py -C /usr/local/be/zopeinstance7.icpeurope.net/config/zope.conf
It looks like, in this case, PID 28631 has died, leaving orphan zope processes. However, zdrun hasn't noticed this and restarted, presumably because these orphans have been left.
Has anyone got any ideas what I can try to debug/fix this problem?
This is a Python bug. It is currently discussed in "zope-dev@zope.org" (subject: "[Zope-dev] Segfault and Deadlock"). Search its archive for more information. We have some chance that the bug will get fixed in the next Python release (2.3.4). -- Dieter
participants (2)
-
Dieter Maurer -
Doug Winter