[Zope] Ever seen this problem?

Eric W. Sink eric@sourcegear.com
Fri, 10 Dec 1999 15:39:21 -0600


We have been chasing a problem with our Zope server.  Comments from
others here on the mailing list have given me the impression that
others might be facing the same quandry.

Basically, the problem is that Zope intermittently "hangs".

When I say "hangs", what I mean is:

1.  It stops serving web requests.  When attempting to access Zope via
HTTP, the tcp connection succeeds, but nothing is every returned.

2.  We cannot get in through the monitor_client

3.  Zope appears to be using no CPU time.  'ps' reports that there are
several Python processes associated with Zope, but none of them are
doing anything.

4.  Since HTTP does not work at all, access to /manage or
/manage_debug doesn't get anywhere.

5.  The only remedy which has worked is to kill the Zope server and
restart it.

6.  netstat does *not* report any unusual number of sockets owned by
python Zope processes.

7.  The size of the Python Zope processes does not seem unusual.  Last
time I checked, they were all around 13M each.

We run Zope 2.1.0 final, the version compiled by Digital Creations,
downloaded from their site.  The same problem existed on 2.1.0b2 and
the 2.0 series.

The problem is not specifically reproduceable.  There is no sequence
of steps which causes Zope to hang in the manner described.
Nonetheless, it happens.  Sometimes it happens a lot.  Sometimes it
happens infrequently.

We run ZServer as our web server.  We have never tried running Zope
behind Apache.  (Should we?)

Our machine is a dual-PIII 550 with 512MB of RAM.  The operating
system is Debian 2.1 (potato).  The problem has appeared under other
configurations, but we have not tried Zope under any other version of
Linux.

Virtually nothing else ever runs on this machine, other than Zope.

The machine is sitting on a 10-base-T Ethernet, connected to a T-1
line.  This Zope server has never been exposed to anything which I
would consider to be a really heavy load.

The Zope server in question can be seen at 
	www.sourcegear.com
It is [usually] up and running.  :-)

Suspicions: This seems to happen more often when we are editing things
using /manage.  But then again, there have been several times when it
has hung while nobody was even around.

We have tried using wget -r to abuse ZServer in the hopes that we
could get it to fail predictably.  This worked once, causing a hang
after five minutes or so.  The next time we tried it, Zope carried on
for half an hour under heavy load, with no problems.  I don't mind
fighting a bug, but I don't enjoy intermittent ones.  :-(

Sometimes Zope stays up longer than others.  We have had situations
where it stayed up for three days.  

And, we have had situations where it stayed up for three minutes.
This latter situation *seemed* to be associated with the presence of a
robot which was crawling our site.  Every time we brought the site
back up, the robot would resume, and the site would go back down.

Watching the log file caused us to suspect a problem with certain
types of acquisition.  We had some bad relative URLs in our content
which were causing deep recursion.  The crawler was happily crunching
through everything it could find, and our log file revealed some
extremely long URLs.

[ We blocked the crawler at our firewall.  :-) ]

However, we fixed every bad relative URL we could find, and the hangs
have continued.  Less frequent, I supposed, but they still continue.

Our site is not terribly large.  We do have quite a few documents, but
nothing out of the ordinary.

Most or all of our images are PNG.

We *do* have a ZCatalog running.

We use the Knowledge product.

One section of our site is a database front end using ZMySQLDA.
However, this part of the site is not currently visible to the rest of
the world (at least there is no link to it).  We have seen no
particular correlation between frequency of access to this section of
the site and frequency of "hangs".

We tried activating the debug_log once, using instructions obtained
from a message posted to this list.  Unfortunately, that was the
attempt wherein Zope ran for half an hour under heavy load with no
problem.

Our Data.fs file is 77MB.  We have never compacted it.

Any advice would be *much* appreciated.  We have a new service that we
are *almost* ready to deploy, and we really need to resolve this
problem before it goes out.

Thanks in advance.

-- 
Eric W. Sink, Software Craftsman
SourceGear Corporation
eric@sourcegear.com