[Zope] ZServer stops responding !? Help !?

Jean-Francois.Doyon at CCRS.NRCan.gc.ca Jean-Francois.Doyon at CCRS.NRCan.gc.ca
Sat Apr 24 15:53:18 EDT 2004


Chris,

Thanks for tips.

Here's what I've tried:

When the sites looks like it's no longer responding, I connect with
the monitor.

Running Zope.app() is very slow, sometimes so slow I give up and try again.

Once I get that setup, I run:

app.Control_Panel.DebugInfo.dbconnections()

>From what I gather, not all connections are taken up.  I have at least 2
that are free:

[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004
(31.63s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24
15:30:36 2004 (534.16s)'}, {'info': "({'HTTP_ACCEPT': ...

I try again a couplle of minutes later and I see:

[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004
(181.71s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24
15:30:36 2004 (684.24s)'}, {'info': "({'HTTP_ACCEPT': ...

The 4 other requests (I have 6 threads) are the same, they haven't changed.
So I think we can exclude running out of threads/db connections as a source
of the problem.  The other 4 requests are for various content types. the
content types themselves work fine, so I'm going ot take note of which ones
they are and see if one keeps recurring or something.  Actually 2 of those
do backend http calls, could there be some socket/timeout issue ? The call
is to a CGI on the very same server though, so I'm confident it's running
fine.

At this point the monitor stops responding, in the middle of outputting that
second list. I hit enter and I get:

error: uncaptured python exception, closing channel <__main__.monitor_client
connected at 0x400d5c0c> (socket.error:(9, 'Bad file descriptor')
[/usr/local/lib/python2.3/asynchat.py|initiate_send|218]
[/usr/local/lib/python2.3/asyncore.py|send|337])

I go look at my trace log, and stuff is still appearing in there ... Though
nothing gets returned.

CPU usage right now is not particularely big ...

[zope at tincup log]$ ps auxw | grep Zope
zope     31498  0.0  0.2  6484 4576 ?        S    15:26   0:00
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/zdaemon/
zope     31499  0.9  8.5 192848 176496 ?     S    15:26   0:13
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31500  0.0  8.5 192848 176496 ?     S    15:26   0:00
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31501  7.0  8.5 192848 176496 ?     S    15:26   1:37
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31502  5.8  8.5 192848 176496 ?     S    15:26   1:21
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31503  4.2  8.5 192848 176496 ?     S    15:26   0:58
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31504  0.5  8.5 192848 176496 ?     S    15:26   0:07
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31505  5.1  8.5 192848 176496 ?     S    15:26   1:11
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
zope     31506  1.6  8.5 192848 176496 ?     S    15:26   0:22
/usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta

I'm going ot go do a requestprofiler now see what comes out ...

Thanks again,
J.F.


-----Original Message-----
From: Chris McDonough [mailto:chrism at plope.com]
Sent: April 24, 2004 2:15 PM
To: Jean-Francois.Doyon at CCRS.NRCan.gc.ca
Cc: zope at zope.org
Subject: Re: [Zope] ZServer stops responding !? Help !?


On Sat, 2004-04-24 at 13:53, Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:
> G'day,
> 
> I've got a rather bizarre but catastrophic problem.
> 
> ZServer seems to stop responding.  Sometimes it does so after days of
> running, sometimes after a few seconds or minutes of uptime.
> 
> I know it's ZServer because I can talk to the monitoring port without
> problem.

That may be a bit of flawed logic, because ZServer also runs the monitor
port.

> Also, the apache processes just pile up up to the limit allowed,
> suggesting the proxying is not getting replies from the downstream
> server.
> 
> The strange thing is the cause seems to be occacional, or vary.  For
> hours on end I can sit there and restart it, and within minutes it stops
> responding ... Then suddenly the problem "disappears", I restart it, and
> I wait .... and nothing happens, it just keeps running.  Nothing else
> abnormal is going on the server so far as I can tell, there is very
> little memory swapped, and the CPU usage is not abnormally high.

It sounds as if Zope is doing something which blocks, consuming all
database threads.

> I used to have this problem very very rarely in the past, but since I
> upgraded to Zope 2.7, it seems to have gotten much worse :(
> 
> I tried accesing the DebugPanel from the monitor, but can't seem to get
> it to do anything useful ... I don't know where else to look to find the
> cause of this.
> 
> This causes serious uptime problem on our main, high traffic site, which
> is Very Bad.
> 
> I'm on RedHat 7.3 (fully patched)
> Python 2.3.3 (custom compiled)
> Zope 2.7
> CMF 1.4.x (I forget ... the latest!)
> Psycopg (Latest also)
> And a variety of other products.

I'd suggest using the "big M" or "trace" logging features along with the
requestprofiler script to find out where the problem might be.

- C



More information about the Zope mailing list