[Zope] Zope Dies Mysteriously After A Few Days. How Do I Investigate Zope's Death?

Wed Dec 10 11:59:55 EST 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dne sreda 10 december 2003 17:46 je Jonathan Mark napisal(a):
> My new goodbyejim.com site runs Zope 2.6 and Python
> 2.1 on Redhat 7.3 It uses Squishdot. Ever since I
> fired the site
> up a few weeks ago it has been dying mysteriously
> every few days.
> When I look in the var/Z2.log file there is no
> evidence that anything
> went wrong.

For me, it was a memory leak problem.

> If I then run the stop script it says that Zope is not
> running. I can then
> always restart Zope using the start script without
> problem.
> If I run the uptime Unix command I find that the Linux
> box itself has been
> running without problem. What could be causing my Zope
> to die? How do I
> investigate it?

Try running dmesg. If it displays anything like this:
Out of Memory: Killed process 14946 (python).

Then it is a memory leak. The very inelegant, but very working
solution for me has been this kind of script:

#!/bin/bash

while true; do
        KILLPID=`ps axuww | grep Zope/Startup/run.py | grep -v grep | sort -n 
- -r -k 5 | awk '{print $2}' | head -1`
        echo `date` Killing $KILLPID
        kill $KILLPID
        sleep 10
        for instance in 1 2 3; do
                RUN=`ps axuww | grep Zope | grep z$instance`
                if [ -z "$RUN" ]; then
                        echo `date` Running zope instance $instance
                        /home/zope/z$instance/bin/runzope &
                fi
        done
        sleep 3600
done

I have three instances of Zope load balanced with pound and every
hour i kill the one that's the most swollen, and restart all those
that are not running.

The load balancer just skips non-running instances and users
experience no downtime.

I plan to fix the script to check if all three zopes are running
and restart them immediately (only sleep for a few seconds or so)
and stop them only when they reach a threshold of memory usage.

The reason I have three instances is the global interpreter lock
issue, that made the zope server crawl with the CPU almost idle.
I found out that running three instances per CPU is quite OK to
fill it up, but I haven't had the time to debug these memory leaks.
I have a cache of some per-user data that does not expire, but
there aren't enough users that this could possibly fill up more than
a hundred megabytes (the zope instances have been growing to well
in excess of 350 megabytes).

The Zope I'm running is a 2.7.0-b3.

- -- 
Jure Koren, n.i.
jure at aufbix.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/11EO9iFCvmuhrCIRAsxYAKDvjWhquvR7+a4pQwVETZXs8VPtlgCfYCMz
cGykYEOzVVa9xDcV7lhpH9Q=
=a3gy
-----END PGP SIGNATURE-----