RE: [Zope] ZServer stops responding !? Help !?
Chris, Thanks for tips. Here's what I've tried: When the sites looks like it's no longer responding, I connect with the monitor. Running Zope.app() is very slow, sometimes so slow I give up and try again. Once I get that setup, I run: app.Control_Panel.DebugInfo.dbconnections()
From what I gather, not all connections are taken up. I have at least 2 that are free:
[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004 (31.63s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24 15:30:36 2004 (534.16s)'}, {'info': "({'HTTP_ACCEPT': ... I try again a couplle of minutes later and I see: [{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004 (181.71s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24 15:30:36 2004 (684.24s)'}, {'info': "({'HTTP_ACCEPT': ... The 4 other requests (I have 6 threads) are the same, they haven't changed. So I think we can exclude running out of threads/db connections as a source of the problem. The other 4 requests are for various content types. the content types themselves work fine, so I'm going ot take note of which ones they are and see if one keeps recurring or something. Actually 2 of those do backend http calls, could there be some socket/timeout issue ? The call is to a CGI on the very same server though, so I'm confident it's running fine. At this point the monitor stops responding, in the middle of outputting that second list. I hit enter and I get: error: uncaptured python exception, closing channel <__main__.monitor_client connected at 0x400d5c0c> (socket.error:(9, 'Bad file descriptor') [/usr/local/lib/python2.3/asynchat.py|initiate_send|218] [/usr/local/lib/python2.3/asyncore.py|send|337]) I go look at my trace log, and stuff is still appearing in there ... Though nothing gets returned. CPU usage right now is not particularely big ... [zope@tincup log]$ ps auxw | grep Zope zope 31498 0.0 0.2 6484 4576 ? S 15:26 0:00 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/zdaemon/ zope 31499 0.9 8.5 192848 176496 ? S 15:26 0:13 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31500 0.0 8.5 192848 176496 ? S 15:26 0:00 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31501 7.0 8.5 192848 176496 ? S 15:26 1:37 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31502 5.8 8.5 192848 176496 ? S 15:26 1:21 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31503 4.2 8.5 192848 176496 ? S 15:26 0:58 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31504 0.5 8.5 192848 176496 ? S 15:26 0:07 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31505 5.1 8.5 192848 176496 ? S 15:26 1:11 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31506 1.6 8.5 192848 176496 ? S 15:26 0:22 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta I'm going ot go do a requestprofiler now see what comes out ... Thanks again, J.F. -----Original Message----- From: Chris McDonough [mailto:chrism@plope.com] Sent: April 24, 2004 2:15 PM To: Jean-Francois.Doyon@CCRS.NRCan.gc.ca Cc: zope@zope.org Subject: Re: [Zope] ZServer stops responding !? Help !? On Sat, 2004-04-24 at 13:53, Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
G'day,
I've got a rather bizarre but catastrophic problem.
ZServer seems to stop responding. Sometimes it does so after days of running, sometimes after a few seconds or minutes of uptime.
I know it's ZServer because I can talk to the monitoring port without problem.
That may be a bit of flawed logic, because ZServer also runs the monitor port.
Also, the apache processes just pile up up to the limit allowed, suggesting the proxying is not getting replies from the downstream server.
The strange thing is the cause seems to be occacional, or vary. For hours on end I can sit there and restart it, and within minutes it stops responding ... Then suddenly the problem "disappears", I restart it, and I wait .... and nothing happens, it just keeps running. Nothing else abnormal is going on the server so far as I can tell, there is very little memory swapped, and the CPU usage is not abnormally high.
It sounds as if Zope is doing something which blocks, consuming all database threads.
I used to have this problem very very rarely in the past, but since I upgraded to Zope 2.7, it seems to have gotten much worse :(
I tried accesing the DebugPanel from the monitor, but can't seem to get it to do anything useful ... I don't know where else to look to find the cause of this.
This causes serious uptime problem on our main, high traffic site, which is Very Bad.
I'm on RedHat 7.3 (fully patched) Python 2.3.3 (custom compiled) Zope 2.7 CMF 1.4.x (I forget ... the latest!) Psycopg (Latest also) And a variety of other products.
I'd suggest using the "big M" or "trace" logging features along with the requestprofiler script to find out where the problem might be. - C
Jean-Francois, What processor are you using? How many CPUs? -d On Sat, 24 Apr 2004 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
Chris,
Thanks for tips.
Here's what I've tried:
When the sites looks like it's no longer responding, I connect with the monitor.
Running Zope.app() is very slow, sometimes so slow I give up and try again.
Once I get that setup, I run:
app.Control_Panel.DebugInfo.dbconnections()
From what I gather, not all connections are taken up. I have at least 2 that are free:
[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004 (31.63s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24 15:30:36 2004 (534.16s)'}, {'info': "({'HTTP_ACCEPT': ...
I try again a couplle of minutes later and I see:
[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004 (181.71s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24 15:30:36 2004 (684.24s)'}, {'info': "({'HTTP_ACCEPT': ...
The 4 other requests (I have 6 threads) are the same, they haven't changed. So I think we can exclude running out of threads/db connections as a source of the problem. The other 4 requests are for various content types. the content types themselves work fine, so I'm going ot take note of which ones they are and see if one keeps recurring or something. Actually 2 of those do backend http calls, could there be some socket/timeout issue ? The call is to a CGI on the very same server though, so I'm confident it's running fine.
At this point the monitor stops responding, in the middle of outputting that second list. I hit enter and I get:
error: uncaptured python exception, closing channel <__main__.monitor_client connected at 0x400d5c0c> (socket.error:(9, 'Bad file descriptor') [/usr/local/lib/python2.3/asynchat.py|initiate_send|218] [/usr/local/lib/python2.3/asyncore.py|send|337])
I go look at my trace log, and stuff is still appearing in there ... Though nothing gets returned.
CPU usage right now is not particularely big ...
[zope@tincup log]$ ps auxw | grep Zope zope 31498 0.0 0.2 6484 4576 ? S 15:26 0:00 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/zdaemon/ zope 31499 0.9 8.5 192848 176496 ? S 15:26 0:13 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31500 0.0 8.5 192848 176496 ? S 15:26 0:00 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31501 7.0 8.5 192848 176496 ? S 15:26 1:37 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31502 5.8 8.5 192848 176496 ? S 15:26 1:21 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31503 4.2 8.5 192848 176496 ? S 15:26 0:58 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31504 0.5 8.5 192848 176496 ? S 15:26 0:07 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31505 5.1 8.5 192848 176496 ? S 15:26 1:11 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31506 1.6 8.5 192848 176496 ? S 15:26 0:22 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
I'm going ot go do a requestprofiler now see what comes out ...
Thanks again, J.F.
-----Original Message----- From: Chris McDonough [mailto:chrism@plope.com] Sent: April 24, 2004 2:15 PM To: Jean-Francois.Doyon@CCRS.NRCan.gc.ca Cc: zope@zope.org Subject: Re: [Zope] ZServer stops responding !? Help !?
On Sat, 2004-04-24 at 13:53, Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
G'day,
I've got a rather bizarre but catastrophic problem.
ZServer seems to stop responding. Sometimes it does so after days of running, sometimes after a few seconds or minutes of uptime.
I know it's ZServer because I can talk to the monitoring port without problem.
That may be a bit of flawed logic, because ZServer also runs the monitor port.
Also, the apache processes just pile up up to the limit allowed, suggesting the proxying is not getting replies from the downstream server.
The strange thing is the cause seems to be occacional, or vary. For hours on end I can sit there and restart it, and within minutes it stops responding ... Then suddenly the problem "disappears", I restart it, and I wait .... and nothing happens, it just keeps running. Nothing else abnormal is going on the server so far as I can tell, there is very little memory swapped, and the CPU usage is not abnormally high.
It sounds as if Zope is doing something which blocks, consuming all database threads.
I used to have this problem very very rarely in the past, but since I upgraded to Zope 2.7, it seems to have gotten much worse :(
I tried accesing the DebugPanel from the monitor, but can't seem to get it to do anything useful ... I don't know where else to look to find the cause of this.
This causes serious uptime problem on our main, high traffic site, which is Very Bad.
I'm on RedHat 7.3 (fully patched) Python 2.3.3 (custom compiled) Zope 2.7 CMF 1.4.x (I forget ... the latest!) Psycopg (Latest also) And a variety of other products.
I'd suggest using the "big M" or "trace" logging features along with the requestprofiler script to find out where the problem might be.
- C
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On Sat, Apr 24, 2004 at 03:53:18PM -0400, Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
of the problem. The other 4 requests are for various content types. the content types themselves work fine, so I'm going ot take note of which ones they are and see if one keeps recurring or something. Actually 2 of those do backend http calls, could there be some socket/timeout issue ?
Yes! My 2.6.2 production zope server hung last week. I spent a little time with this recipe: http://www.zopelabs.com/cookbook/1073504990 ... and traced the problem to an external method that was using urllib2.urlopen(). I checked the same url using wget on the command line, and sure enough, no response. Apparently, in certain network conditions (e.g. a firewall is blocking the thing you are trying to access), you can sometimes wait literally forever for urllib2.urlopen() to finish. In python 2.1, urllib2 offers no control over timeouts. 2.3 or 2.2 added timeout handling to the underlying socket library, so maybe there's a way to do it now. Oddly, in my case, zope apparently hung after only one or two requests to the external method. That surprises me.
The call is to a CGI on the very same server though, so I'm confident it's running fine.
*shrug* that is odd. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's ! (random hero from isometric.spaceninja.com)
participants (3)
-
Dennis Allison -
Jean-Francois.Doyon@CCRS.NRCan.gc.ca -
Paul Winkler