I wonder if you are not seeing a threading problem. Python has a single global interpreter lock ("GIL") and there are many interactions with schedulers that I do not fully understand. On my system(s) I run dual Athlon machines with 4GB of memory and have occasionally seen unexplained behavior others have attributed to theading conflicts and the GIL. Most 2.4 kernels do not support CPU affinity, but that's the right solution. CPU affinitiy is available with the 2.6 kernels and with RH9. but RH9 has its own problems, now mostly fixed. You might want to try running with a single CPU and see if your problem remains. -d On Sat, 24 Apr 2004 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
Dennis,
I'm using a 4 CPU machine PIII's at 700Mhz each, with 2 GB of RAM. (Dell PE6400)
I've got the check interval at 200 (Pystones/50, as I once saw somewhere).
I runs very fast the rest of the time though :)
As tyime goes by I'm compiling some data as to what the server is doing, and I'm focusing on one particular content type.
This type does avariety things depending on the condition. The "simplest" is that it does nothing particular, just uses it's attributes and methods.
In one case, it could run a local system command using os.popen() ...
And in the really worst case, it starts an FTP connection. To a server local to our shop, but that I don't control.
I'm wondeing if there's problems with the FTP connection or the system call maybe in some circumstances or for specific instances. Maybe osme "bad data" gets returned ?
Personally I would tend to look more towards the network I/O as a potential source of "blocking" ...
Anybody the intricacies of how the network I/O is handled in Python 2.3 and/or Zope 2.7 ? Any differences with previous versions ?
The hunt continues ...
Much thanks to all who are helping !
J.F.
-----Original Message----- From: Dennis Allison [mailto:allison@sumeru.stanford.EDU] Sent: April 24, 2004 4:31 PM To: Jean-Francois.Doyon@CCRS.NRCan.gc.ca Cc: chrism@plope.com; zope@zope.org Subject: RE: [Zope] ZServer stops responding !? Help !?
Jean-Francois,
What processor are you using? How many CPUs?
-d
On Sat, 24 Apr 2004 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
Chris,
Thanks for tips.
Here's what I've tried:
When the sites looks like it's no longer responding, I connect with the monitor.
Running Zope.app() is very slow, sometimes so slow I give up and try again.
Once I get that setup, I run:
app.Control_Panel.DebugInfo.dbconnections()
From what I gather, not all connections are taken up. I have at least 2 that are free:
[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004 (31.63s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24 15:30:36 2004 (534.16s)'}, {'info': "({'HTTP_ACCEPT': ...
I try again a couplle of minutes later and I see:
[{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004 (181.71s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24 15:30:36 2004 (684.24s)'}, {'info': "({'HTTP_ACCEPT': ...
The 4 other requests (I have 6 threads) are the same, they haven't changed. So I think we can exclude running out of threads/db connections as a source of the problem. The other 4 requests are for various content types. the content types themselves work fine, so I'm going ot take note of which ones they are and see if one keeps recurring or something. Actually 2 of those do backend http calls, could there be some socket/timeout issue ? The call is to a CGI on the very same server though, so I'm confident it's running fine.
At this point the monitor stops responding, in the middle of outputting that second list. I hit enter and I get:
error: uncaptured python exception, closing channel <__main__.monitor_client connected at 0x400d5c0c> (socket.error:(9, 'Bad file descriptor') [/usr/local/lib/python2.3/asynchat.py|initiate_send|218] [/usr/local/lib/python2.3/asyncore.py|send|337])
I go look at my trace log, and stuff is still appearing in there ... Though nothing gets returned.
CPU usage right now is not particularely big ...
[zope@tincup log]$ ps auxw | grep Zope zope 31498 0.0 0.2 6484 4576 ? S 15:26 0:00 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/zdaemon/ zope 31499 0.9 8.5 192848 176496 ? S 15:26 0:13 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31500 0.0 8.5 192848 176496 ? S 15:26 0:00 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31501 7.0 8.5 192848 176496 ? S 15:26 1:37 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31502 5.8 8.5 192848 176496 ? S 15:26 1:21 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31503 4.2 8.5 192848 176496 ? S 15:26 0:58 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31504 0.5 8.5 192848 176496 ? S 15:26 0:07 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31505 5.1 8.5 192848 176496 ? S 15:26 1:11 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta zope 31506 1.6 8.5 192848 176496 ? S 15:26 0:22 /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
I'm going ot go do a requestprofiler now see what comes out ...
Thanks again, J.F.
-----Original Message----- From: Chris McDonough [mailto:chrism@plope.com] Sent: April 24, 2004 2:15 PM To: Jean-Francois.Doyon@CCRS.NRCan.gc.ca Cc: zope@zope.org Subject: Re: [Zope] ZServer stops responding !? Help !?
On Sat, 2004-04-24 at 13:53, Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
G'day,
I've got a rather bizarre but catastrophic problem.
ZServer seems to stop responding. Sometimes it does so after days of running, sometimes after a few seconds or minutes of uptime.
I know it's ZServer because I can talk to the monitoring port without problem.
That may be a bit of flawed logic, because ZServer also runs the monitor port.
Also, the apache processes just pile up up to the limit allowed, suggesting the proxying is not getting replies from the downstream server.
The strange thing is the cause seems to be occacional, or vary. For hours on end I can sit there and restart it, and within minutes it stops responding ... Then suddenly the problem "disappears", I restart it, and I wait .... and nothing happens, it just keeps running. Nothing else abnormal is going on the server so far as I can tell, there is very little memory swapped, and the CPU usage is not abnormally high.
It sounds as if Zope is doing something which blocks, consuming all database threads.
I used to have this problem very very rarely in the past, but since I upgraded to Zope 2.7, it seems to have gotten much worse :(
I tried accesing the DebugPanel from the monitor, but can't seem to get it to do anything useful ... I don't know where else to look to find the cause of this.
This causes serious uptime problem on our main, high traffic site, which is Very Bad.
I'm on RedHat 7.3 (fully patched) Python 2.3.3 (custom compiled) Zope 2.7 CMF 1.4.x (I forget ... the latest!) Psycopg (Latest also) And a variety of other products.
I'd suggest using the "big M" or "trace" logging features along with the requestprofiler script to find out where the problem might be.
- C
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )