[Zope-dev] RE: [Zope] Re: Zope hanging (poss. threads-related )

Fri, 14 Apr 2000 20:21:44 +0200

> -----Original Message-----
> From: Tony Rossignol [mailto:tonyr@ep.newtimes.com]
> Sent: 14 April 2000 19:19
> To: Marcus Collins; zope-dev@zope.org
> Subject: Re: [Zope-dev] RE: [Zope] Re: Zope hanging (poss.
> threads-related)

> Thank you for starting this.  I'll try to gather up information 
> I've been trying to collect here and post it in the next few days.  

Thanks! Maybe you could also look at extending the DebugLogger output
(http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/DebugLogge
r) and posting the results of any hanging there?

> RE: could be zombie related -
> 
> Where might I find more info on this?  Could this zombie issue be
> present in FCGI as well?  

Amos Lattier remarked in
http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html that:

  "The ZServer zombie stuff is to get rid of zombie client 
   connections, not zombie publishing threads. These are quite 
   different beasts."

I'm not yet grokking the whole picture, so I can't really answer to that.
Note that there is an outstanding issue in the Collector at
http://classic.zope.org:8080/Collector/954/view that might be related. As
you previously noted, there is no zombie_timeout in the FCGI server.

> We have noticed once restarts start they get worse when under a 
> load. I've been suspecting that the longer pages take to load the 
> more people are just stopping the page load, and this would/could 
> create a zombie. 

You'll sometimes note on the console or your logs something like the
following:

2000-04-14T14:00:26 ERROR(200) ZServer uncaptured python exception, closing
channel <PCGIChannel at 87567b0> (socket.error:(32, 'Broken pipe')
[/usr/local/Zope-2.1.6-src/ZServer/medusa/asynchat.py|initiate_send|211]
[/usr/local/Zope-2.1.6-src/ZServer/medusa/asyncore.py|send|237])

This occurs (I surmise) when the client closes the channel before ZServer
has sent its response. I presume the fact that it closes the channel would
mean no zombie, but I'd like to know more about this.

> The problem is I don't know how to identify or research what 
> is going on under the hood.  Could when someone terminates a 
> connection be an issue here?  I mean; if the request in queued 
> waiting for a thread might act differently than a request that is 
> being processed by Zope and waiting on a DB query, or even 
> termination once zope has already started passing results back 
> through the pipe.  Our restarts are so hard to tie down I'm
> guessing it's a very subtle issue or a combination of just the 
> wrong factors.

We have quite a number of the above errors occurring, and they seem to
correlate to people terminating the connection (or browser timeout), from
what I've seen internally. I also feel at a loss here, focussing on a very
small part of ZServer and possibly missing the big picture. 

> Just some more food for thought.

Thanks.

-- Marcus