Zope Dies Mysteriously After A Few Days. How Do I Investigate Zope's Death?
My new goodbyejim.com site runs Zope 2.6 and Python 2.1 on Redhat 7.3 It uses Squishdot. Ever since I fired the site up a few weeks ago it has been dying mysteriously every few days. When I look in the var/Z2.log file there is no evidence that anything went wrong. If I then run the stop script it says that Zope is not running. I can then always restart Zope using the start script without problem. If I run the uptime Unix command I find that the Linux box itself has been running without problem. What could be causing my Zope to die? How do I investigate it? __________________________________ Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing. http://photos.yahoo.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dne sreda 10 december 2003 17:46 je Jonathan Mark napisal(a):
My new goodbyejim.com site runs Zope 2.6 and Python 2.1 on Redhat 7.3 It uses Squishdot. Ever since I fired the site up a few weeks ago it has been dying mysteriously every few days. When I look in the var/Z2.log file there is no evidence that anything went wrong.
For me, it was a memory leak problem.
If I then run the stop script it says that Zope is not running. I can then always restart Zope using the start script without problem. If I run the uptime Unix command I find that the Linux box itself has been running without problem. What could be causing my Zope to die? How do I investigate it?
Try running dmesg. If it displays anything like this: Out of Memory: Killed process 14946 (python). Then it is a memory leak. The very inelegant, but very working solution for me has been this kind of script: #!/bin/bash while true; do KILLPID=`ps axuww | grep Zope/Startup/run.py | grep -v grep | sort -n - -r -k 5 | awk '{print $2}' | head -1` echo `date` Killing $KILLPID kill $KILLPID sleep 10 for instance in 1 2 3; do RUN=`ps axuww | grep Zope | grep z$instance` if [ -z "$RUN" ]; then echo `date` Running zope instance $instance /home/zope/z$instance/bin/runzope & fi done sleep 3600 done I have three instances of Zope load balanced with pound and every hour i kill the one that's the most swollen, and restart all those that are not running. The load balancer just skips non-running instances and users experience no downtime. I plan to fix the script to check if all three zopes are running and restart them immediately (only sleep for a few seconds or so) and stop them only when they reach a threshold of memory usage. The reason I have three instances is the global interpreter lock issue, that made the zope server crawl with the CPU almost idle. I found out that running three instances per CPU is quite OK to fill it up, but I haven't had the time to debug these memory leaks. I have a cache of some per-user data that does not expire, but there aren't enough users that this could possibly fill up more than a hundred megabytes (the zope instances have been growing to well in excess of 350 megabytes). The Zope I'm running is a 2.7.0-b3. - -- Jure Koren, n.i. jure@aufbix.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/11EO9iFCvmuhrCIRAsxYAKDvjWhquvR7+a4pQwVETZXs8VPtlgCfYCMz cGykYEOzVVa9xDcV7lhpH9Q= =a3gy -----END PGP SIGNATURE-----
Jonathan Mark wrote at 2003-12-10 08:46 -0800:
... What could be causing my Zope to die? How do I investigate it?
The first step is to activate Zope logging (--> "doc/LOGGING.txt") and look into the log file. Probably, after that, we will see that Zope dies from a fatal signal (often "SIGSEGV"). Then you will need to ensure that core file can be written (--> "ulimit -c" bash command) and look into the generated core file (with a "C" debugger, e.g. "gdb"). When you got Python from a prebuild package, it may have lost all symbol information (stripped). In this case, you will need to rebuild Python from source and keep the symbol information. -- Dieter
participants (3)
-
Dieter Maurer -
Jonathan Mark -
Jure Koren