[Zope] ZServer stops responding !? (Success!!)

Jean-Francois.Doyon at CCRS.NRCan.gc.ca Jean-Francois.Doyon at CCRS.NRCan.gc.ca
Sat Apr 24 17:14:22 EDT 2004


Dennis,

Yup, I'm aware of all those issues.  I'm due for an OS upgrade, and
the 2.6 kernel is definitely in the plan precisely because of that
feature.

Hardware upgrades are harder to come by, but single-cpu will also
be part of the requriements!

In the mean time I think I figured it out (Time will tell.):

I have a type class that caches some infromation in attributes, but
when it's out of date it needs to get some of it from an FTP server.

Apparently this feature is broken right now ... At first I noticed
it was using a bad path, so I fixed that, but that still didn't help.

I'm guessing something's change related to the FTP server and I wasn't
notified.

So I've disabled it entirely, because you know, it's saturday and
gorgeous outside!

Of note, Python 2.3 now supports socket timeout, something I've been
wanting for a while, precisly because I was worried something like
this could occur. I'll look after it all later.

Many thanks to those who took the time to help on a week-end day, it
was very helpful, and sorry to add traffic to the list, I work best
when I'm thinking outloud :)

Cheers,
J.F.

-----Original Message-----
From: Dennis Allison [mailto:allison at sumeru.stanford.EDU]
Sent: April 24, 2004 5:04 PM
To: Jean-Francois.Doyon at CCRS.NRCan.gc.ca
Cc: chrism at plope.com; zope at zope.org
Subject: RE: [Zope] ZServer stops responding !? Help !?



I wonder if you are not seeing a threading problem.  Python has a single
global interpreter lock ("GIL") and there are many interactions with
schedulers that I do not fully understand.  On my system(s) I run dual
Athlon machines with 4GB of memory and have occasionally seen unexplained
behavior others have attributed to theading conflicts and the GIL.

Most 2.4 kernels do not support CPU affinity, but that's the right
solution.  CPU affinitiy is available with the 2.6 kernels and with RH9.
but RH9 has its own problems, now mostly fixed.

You might want to try running with a single CPU and see if your problem
remains.

	-d

On Sat, 24 Apr 2004 Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:

> Dennis,
> 
> I'm using a 4 CPU machine PIII's at 700Mhz each, with 2 GB of RAM. (Dell
> PE6400)
> 
> I've got the check interval at 200 (Pystones/50, as I once saw somewhere).
> 
> I runs very fast the rest of the time though :)
> 
> As tyime goes by I'm compiling some data as to what the server is doing,
and
> I'm focusing
> on one particular content type.
> 
> This type does avariety things depending on the condition.  The "simplest"
> is that it does
> nothing particular, just uses it's attributes and methods.
> 
> In one case, it could run a local system command using os.popen() ...
> 
> And in the really worst case, it starts an FTP connection.  To a server
> local to our shop,
> but that I don't control.
> 
> I'm wondeing if there's problems with the FTP connection or the system
call
> maybe in some
> circumstances or for specific instances. Maybe osme "bad data" gets
returned
> ?
> 
> Personally I would tend to look more towards the network I/O as a
potential
> source of "blocking" ...
> 
> Anybody the intricacies of how the network I/O is handled in Python 2.3
> and/or Zope 2.7 ? Any
> differences with previous versions ?
> 
> The hunt continues ...
> 
> Much thanks to all who are helping !
> 
> J.F.
> 
> -----Original Message-----
> From: Dennis Allison [mailto:allison at sumeru.stanford.EDU]
> Sent: April 24, 2004 4:31 PM
> To: Jean-Francois.Doyon at CCRS.NRCan.gc.ca
> Cc: chrism at plope.com; zope at zope.org
> Subject: RE: [Zope] ZServer stops responding !? Help !?
> 
> 
> Jean-Francois,
> 
> What processor are you using?  How many CPUs?  
> 
> 	-d
> 
> On Sat, 24 Apr 2004 Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:
> 
> > Chris,
> > 
> > Thanks for tips.
> > 
> > Here's what I've tried:
> > 
> > When the sites looks like it's no longer responding, I connect with
> > the monitor.
> > 
> > Running Zope.app() is very slow, sometimes so slow I give up and try
> again.
> > 
> > Once I get that setup, I run:
> > 
> > app.Control_Panel.DebugInfo.dbconnections()
> > 
> > >From what I gather, not all connections are taken up.  I have at least
2
> > that are free:
> > 
> > [{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004
> > (31.63s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24
> > 15:30:36 2004 (534.16s)'}, {'info': "({'HTTP_ACCEPT': ...
> > 
> > I try again a couplle of minutes later and I see:
> > 
> > [{'info': ' (1391)', 'version': '', 'opened': 'Sat Apr 24 15:38:59 2004
> > (181.71s)'}, {'info': ' (7290)', 'version': '', 'opened': 'Sat Apr 24
> > 15:30:36 2004 (684.24s)'}, {'info': "({'HTTP_ACCEPT': ...
> > 
> > The 4 other requests (I have 6 threads) are the same, they haven't
> changed.
> > So I think we can exclude running out of threads/db connections as a
> source
> > of the problem.  The other 4 requests are for various content types. the
> > content types themselves work fine, so I'm going ot take note of which
> ones
> > they are and see if one keeps recurring or something.  Actually 2 of
those
> > do backend http calls, could there be some socket/timeout issue ? The
call
> > is to a CGI on the very same server though, so I'm confident it's
running
> > fine.
> > 
> > At this point the monitor stops responding, in the middle of outputting
> that
> > second list. I hit enter and I get:
> > 
> > error: uncaptured python exception, closing channel
> <__main__.monitor_client
> > connected at 0x400d5c0c> (socket.error:(9, 'Bad file descriptor')
> > [/usr/local/lib/python2.3/asynchat.py|initiate_send|218]
> > [/usr/local/lib/python2.3/asyncore.py|send|337])
> > 
> > I go look at my trace log, and stuff is still appearing in there ...
> Though
> > nothing gets returned.
> > 
> > CPU usage right now is not particularely big ...
> > 
> > [zope at tincup log]$ ps auxw | grep Zope
> > zope     31498  0.0  0.2  6484 4576 ?        S    15:26   0:00
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/zdaemon/
> > zope     31499  0.9  8.5 192848 176496 ?     S    15:26   0:13
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31500  0.0  8.5 192848 176496 ?     S    15:26   0:00
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31501  7.0  8.5 192848 176496 ?     S    15:26   1:37
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31502  5.8  8.5 192848 176496 ?     S    15:26   1:21
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31503  4.2  8.5 192848 176496 ?     S    15:26   0:58
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31504  0.5  8.5 192848 176496 ?     S    15:26   0:07
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31505  5.1  8.5 192848 176496 ?     S    15:26   1:11
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > zope     31506  1.6  8.5 192848 176496 ?     S    15:26   0:22
> > /usr/local/bin/python2.3 /usr/local/Zope-2.7-Core/lib/python/Zope/Sta
> > 
> > I'm going ot go do a requestprofiler now see what comes out ...
> > 
> > Thanks again,
> > J.F.
> > 
> > 
> > -----Original Message-----
> > From: Chris McDonough [mailto:chrism at plope.com]
> > Sent: April 24, 2004 2:15 PM
> > To: Jean-Francois.Doyon at CCRS.NRCan.gc.ca
> > Cc: zope at zope.org
> > Subject: Re: [Zope] ZServer stops responding !? Help !?
> > 
> > 
> > On Sat, 2004-04-24 at 13:53, Jean-Francois.Doyon at CCRS.NRCan.gc.ca wrote:
> > > G'day,
> > > 
> > > I've got a rather bizarre but catastrophic problem.
> > > 
> > > ZServer seems to stop responding.  Sometimes it does so after days of
> > > running, sometimes after a few seconds or minutes of uptime.
> > > 
> > > I know it's ZServer because I can talk to the monitoring port without
> > > problem.
> > 
> > That may be a bit of flawed logic, because ZServer also runs the monitor
> > port.
> > 
> > > Also, the apache processes just pile up up to the limit allowed,
> > > suggesting the proxying is not getting replies from the downstream
> > > server.
> > > 
> > > The strange thing is the cause seems to be occacional, or vary.  For
> > > hours on end I can sit there and restart it, and within minutes it
stops
> > > responding ... Then suddenly the problem "disappears", I restart it,
and
> > > I wait .... and nothing happens, it just keeps running.  Nothing else
> > > abnormal is going on the server so far as I can tell, there is very
> > > little memory swapped, and the CPU usage is not abnormally high.
> > 
> > It sounds as if Zope is doing something which blocks, consuming all
> > database threads.
> > 
> > > I used to have this problem very very rarely in the past, but since I
> > > upgraded to Zope 2.7, it seems to have gotten much worse :(
> > > 
> > > I tried accesing the DebugPanel from the monitor, but can't seem to
get
> > > it to do anything useful ... I don't know where else to look to find
the
> > > cause of this.
> > > 
> > > This causes serious uptime problem on our main, high traffic site,
which
> > > is Very Bad.
> > > 
> > > I'm on RedHat 7.3 (fully patched)
> > > Python 2.3.3 (custom compiled)
> > > Zope 2.7
> > > CMF 1.4.x (I forget ... the latest!)
> > > Psycopg (Latest also)
> > > And a variety of other products.
> > 
> > I'd suggest using the "big M" or "trace" logging features along with the
> > requestprofiler script to find out where the problem might be.
> > 
> > - C
> > 
> > _______________________________________________
> > Zope maillist  -  Zope at zope.org
> > http://mail.zope.org/mailman/listinfo/zope
> > **   No cross posts or HTML encoding!  **
> > (Related lists - 
> >  http://mail.zope.org/mailman/listinfo/zope-announce
> >  http://mail.zope.org/mailman/listinfo/zope-dev )
> > 
> 
> _______________________________________________
> Zope maillist  -  Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - 
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )
> 



More information about the Zope mailing list