-----Original Message----- From: Tony Rossignol [mailto:tonyr@ep.newtimes.com] Sent: 14 April 2000 19:19 To: Marcus Collins; zope-dev@zope.org Subject: Re: [Zope-dev] RE: [Zope] Re: Zope hanging (poss. threads-related)
Thank you for starting this. I'll try to gather up information I've been trying to collect here and post it in the next few days.
Thanks! Maybe you could also look at extending the DebugLogger output (http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/DebugLogge r) and posting the results of any hanging there?
RE: could be zombie related -
Where might I find more info on this? Could this zombie issue be present in FCGI as well?
Amos Lattier remarked in http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html that: "The ZServer zombie stuff is to get rid of zombie client connections, not zombie publishing threads. These are quite different beasts." I'm not yet grokking the whole picture, so I can't really answer to that. Note that there is an outstanding issue in the Collector at http://classic.zope.org:8080/Collector/954/view that might be related. As you previously noted, there is no zombie_timeout in the FCGI server.
We have noticed once restarts start they get worse when under a load. I've been suspecting that the longer pages take to load the more people are just stopping the page load, and this would/could create a zombie.
You'll sometimes note on the console or your logs something like the following: 2000-04-14T14:00:26 ERROR(200) ZServer uncaptured python exception, closing channel <PCGIChannel at 87567b0> (socket.error:(32, 'Broken pipe') [/usr/local/Zope-2.1.6-src/ZServer/medusa/asynchat.py|initiate_send|211] [/usr/local/Zope-2.1.6-src/ZServer/medusa/asyncore.py|send|237]) This occurs (I surmise) when the client closes the channel before ZServer has sent its response. I presume the fact that it closes the channel would mean no zombie, but I'd like to know more about this.
The problem is I don't know how to identify or research what is going on under the hood. Could when someone terminates a connection be an issue here? I mean; if the request in queued waiting for a thread might act differently than a request that is being processed by Zope and waiting on a DB query, or even termination once zope has already started passing results back through the pipe. Our restarts are so hard to tie down I'm guessing it's a very subtle issue or a combination of just the wrong factors.
We have quite a number of the above errors occurring, and they seem to correlate to people terminating the connection (or browser timeout), from what I've seen internally. I also feel at a loss here, focussing on a very small part of ZServer and possibly missing the big picture.
Just some more food for thought.
Thanks. -- Marcus