RE: [Zope-dev] RE: [Zope] Re: Zope hanging (poss. threads-related )

14 Apr 2000

      ...
-----Original Message-----
From: Tony Rossignol [mailto:tonyr@ep.newtimes.com]
Sent: 14 April 2000 19:19
To: Marcus Collins; zope-dev@zope.org
Subject: Re: [Zope-dev] RE: [Zope] Re: Zope hanging (poss.
threads-related)
...
Thank you for starting this.  I'll try to gather up information 
I've been trying to collect here and post it in the next few days.
Thanks! Maybe you could also look at extending the DebugLogger output
(http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/DebugLogge
r) and posting the results of any hanging there?
...
RE: could be zombie related -
Where might I find more info on this?  Could this zombie issue be
present in FCGI as well?
Amos Lattier remarked in
http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html that:

  "The ZServer zombie stuff is to get rid of zombie client 
   connections, not zombie publishing threads. These are quite 
   different beasts."

I'm not yet grokking the whole picture, so I can't really answer to that.
Note that there is an outstanding issue in the Collector at
http://classic.zope.org:8080/Collector/954/view that might be related. As
you previously noted, there is no zombie_timeout in the FCGI server.
...
We have noticed once restarts start they get worse when under a 
load. I've been suspecting that the longer pages take to load the 
more people are just stopping the page load, and this would/could 
create a zombie.
You'll sometimes note on the console or your logs something like the
following:

2000-04-14T14:00:26 ERROR(200) ZServer uncaptured python exception, closing
channel <PCGIChannel at 87567b0> (socket.error:(32, 'Broken pipe')
[/usr/local/Zope-2.1.6-src/ZServer/medusa/asynchat.py|initiate_send|211]
[/usr/local/Zope-2.1.6-src/ZServer/medusa/asyncore.py|send|237])

This occurs (I surmise) when the client closes the channel before ZServer
has sent its response. I presume the fact that it closes the channel would
mean no zombie, but I'd like to know more about this.
...
The problem is I don't know how to identify or research what 
is going on under the hood.  Could when someone terminates a 
connection be an issue here?  I mean; if the request in queued 
waiting for a thread might act differently than a request that is 
being processed by Zope and waiting on a DB query, or even 
termination once zope has already started passing results back 
through the pipe.  Our restarts are so hard to tie down I'm
guessing it's a very subtle issue or a combination of just the 
wrong factors.
We have quite a number of the above errors occurring, and they seem to
correlate to people terminating the connection (or browser timeout), from
what I've seen internally. I also feel at a loss here, focussing on a very
small part of ZServer and possibly missing the big picture.
...
Just some more food for thought.
Thanks.

-- Marcus