[Zope-dev] Re: Zope hanging (poss. threads-related)
Tres Seaver
tseaver@palladion.com
Sun, 16 Apr 2000 18:19:55 -0500
Michel Pelletier <michel@digicool.com> wrote:
>
> Marcus Collins wrote:
> >
> > > -----Original Message-----
> > > From: Tony Rossignol [mailto:tonyr@ep.newtimes.com]
> > > Sent: 14 April 2000 19:19
> > > To: Marcus Collins; zope-dev@zope.org
> > > Subject: Re: [Zope-dev] RE: [Zope] Re: Zope hanging (poss.
> > > threads-related)
> >
> > > Thank you for starting this. I'll try to gather up information
> > > I've been trying to collect here and post it in the next few days.
> >
> > Thanks! Maybe you could also look at extending the DebugLogger output
> > > <URL:http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/DebugLogger>
> > > and posting the results of any hanging there?
> >
> > > RE: could be zombie related -
> > >
> > > Where might I find more info on this? Could this zombie issue be
> > > present in FCGI as well?
> >
> > Amos Lattier remarked in
> > http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html that:
> >
> > "The ZServer zombie stuff is to get rid of zombie client
> > connections, not zombie publishing threads. These are quite
> > different beasts."
> >
> > I'm not yet grokking the whole picture, so I can't really answer to that.
> > Note that there is an outstanding issue in the Collector at
> > http://classic.zope.org:8080/Collector/954/view that might be related. As
> > you previously noted, there is no zombie_timeout in the FCGI server.
>
> What the Zombie timeout means is that after a publishing thread gets
> done answering a request, the socket may not go away. This many for a a
> number of reasons, the client 'hung' and is not 'putting down the phone
> after the converstation is over' (so to speak) or network troubles may
> prevent the connection from closing properly. This means that there is
> a 'zombie' connection laying around. This zombie will probably end up
> going away on its own, but if not, ZServer will kill it after a period
> of time.
>
> The only reasorce laying around during the life of a Zombie is an tiny
> little unused open socket, the Mack truck of a Zope thread that served
> the request for the zombie socket does not 'hang' for that entire period
> of time, but goes on after it has completed the request to serve other
> requests.
>
> Amos is correct in that these problems are almost always at the
> Application level, and not at the ZServer level. The fact that Pavlos
> can prevent hanging by inserting a print statement in the asyncore loop
> is suspicious, but we do not have enough information yet to point
> fingers anywhere.
Here are a couple of commonalities we need to look at:
* IE 5 seems to be involved.
* The management interface seems to trigger the problem (no
application-specific code involved, just manipulation of "stock"
Zope objects).
* Framesets may be involved.
* Perhaps premature request cancellation (likely to happen often
in the management interface) is involved.
I would like to see a description of the interaction between the
ZServer-async-select() thread and the Zope threads to which it dispatches
requests, particularly what happens when a Zope thread tries to write to a
connection which has been closed by the client without a "clean" shutdown
(e.g., attempting to write to the socket with fail with errno == EPIPE).
I have a couple of testcases for this problem. (Un)fortunately, I don't use IE5
(and don't go with girls who do! :), and therefore they all work without a
hitch.
First, an ExternalMethod, implemented so::
from time import sleep
def stall( secs = 30 ):
""" Delay to allow testing. """
sleep( secs )
return "Slept %d seconds" % secs
Called trivially from DTML method called 'test_stall'::
<html>
<head>
<title><dtml-var document_title></title>
</head>
<body>
<h2><dtml-var document_title></h2>
<dtml-unless seconds>
<dtml-call "REQUEST.set( 'seconds', 60 )">
</dtml-unless>
<p><dtml-var "stall( seconds )"></p>
</body></html>
Exercised repeatedly through a frameset::
<html>
<head>
<title> Test Stalling </title>
</head>
<frameset rows="33%, 33%, 34%">
<frameset cols="33%, 33%, 34%">
<frame name="r1c1" src="test_stall?seconds:int=30">
<frame name="r1c2" src="test_stall?seconds:int=30">
<frame name="r1c3" src="test_stall?seconds:int=30">
</frameset>
<frameset cols="33%, 33%, 34%">
<frame name="r2c1" src="test_stall?seconds:int=30">
<frame name="r2c2" src="test_stall?seconds:int=30">
<frame name="r2c3" src="test_stall?seconds:int=30">
</frameset>
<frameset cols="33%, 33%, 34%">
<frame name="r3c1" src="test_stall?seconds:int=30">
<frame name="r3c2" src="test_stall?seconds:int=30">
<frame name="r3c3" src="test_stall?seconds:int=30">
</frameset>
</frameset>
</html>
I can run this repeatedly, stopping it in mid-run, etc., all without causing any
problems that I can diagnose. Anyone care to try it on their setup with IE5 as
the client?
Tres.
--
=========================================================
Tres Seaver tseaver@digicool.com tseaver@palladion.com