[Zope-dev] Re: Zope hanging (poss. threads-related)

Tres Seaver tseaver@palladion.com
Sun, 16 Apr 2000 18:19:55 -0500


Michel Pelletier <michel@digicool.com> wrote:
> 
> Marcus Collins wrote:
> >
> > > -----Original Message-----
> > > From: Tony Rossignol [mailto:tonyr@ep.newtimes.com]
> > > Sent: 14 April 2000 19:19
> > > To: Marcus Collins; zope-dev@zope.org
> > > Subject: Re: [Zope-dev] RE: [Zope] Re: Zope hanging (poss.
> > > threads-related)
> >
> > > Thank you for starting this.  I'll try to gather up information
> > > I've been trying to collect here and post it in the next few days.
> >
> > Thanks! Maybe you could also look at extending the DebugLogger output
> > > <URL:http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/DebugLogger>
> > > and posting the results of any hanging there?
> >
> > > RE: could be zombie related -
> > >
> > > Where might I find more info on this?  Could this zombie issue be
> > > present in FCGI as well?
> >
> > Amos Lattier remarked in
> > http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html that:
> >
> >   "The ZServer zombie stuff is to get rid of zombie client
> >    connections, not zombie publishing threads. These are quite
> >    different beasts."
> >
> > I'm not yet grokking the whole picture, so I can't really answer to that.
> > Note that there is an outstanding issue in the Collector at
> > http://classic.zope.org:8080/Collector/954/view that might be related. As
> > you previously noted, there is no zombie_timeout in the FCGI server.
> 
> What the Zombie timeout means is that after a publishing thread gets
> done answering a request, the socket may not go away.  This many for a a
> number of reasons, the client 'hung' and is not 'putting down the phone
> after the converstation is over' (so to speak) or network troubles may
> prevent the connection from closing properly.  This means that there is
> a 'zombie' connection laying around.  This zombie will probably end up
> going away on its own, but if not, ZServer will kill it after a period
> of time.
> 
> The only reasorce laying around during the life of a Zombie is an tiny
> little unused open socket, the Mack truck of a Zope thread that served
> the request for the zombie socket does not 'hang' for that entire period
> of time, but goes on after it has completed the request to serve other
> requests.
> 
> Amos is correct in that these problems are almost always at the
> Application level, and not at the ZServer level.  The fact that Pavlos
> can prevent hanging by inserting a print statement in the asyncore loop
> is suspicious, but we do not have enough information yet to point
> fingers anywhere.

Here are a couple of commonalities we need to look at:

  * IE 5 seems to be involved.

  * The management interface seems to trigger the problem (no
    application-specific code involved, just manipulation of "stock"
    Zope objects).

  * Framesets may be involved.

  * Perhaps premature request cancellation (likely to happen often
    in the management interface) is involved.

I would like to see a description of the interaction between the 
ZServer-async-select() thread and the Zope threads to which it dispatches
requests, particularly what happens when a Zope thread tries to write to a
connection which has been closed by the client without a "clean" shutdown
(e.g., attempting to write to the socket with fail with errno == EPIPE).

I have a couple of testcases for this problem.  (Un)fortunately, I don't use IE5
(and don't go with girls who do! :), and therefore they all work without a
hitch.

First, an ExternalMethod, implemented so::

 from time import sleep

 def stall( secs = 30 ):
     """ Delay to allow testing. """
     sleep( secs )
     return "Slept %d seconds" % secs

Called trivially from DTML method called 'test_stall'::

 <html>
 <head>
 <title><dtml-var document_title></title>
 </head>
 <body>

 <h2><dtml-var document_title></h2>

 <dtml-unless seconds>
   <dtml-call "REQUEST.set( 'seconds', 60 )">
 </dtml-unless>

 <p><dtml-var "stall( seconds )"></p>

 </body></html>

Exercised repeatedly through a frameset::

 <html>
 <head>
 <title> Test Stalling </title>
 </head>
 <frameset rows="33%, 33%, 34%">
  <frameset cols="33%, 33%, 34%">
    <frame name="r1c1" src="test_stall?seconds:int=30">
    <frame name="r1c2" src="test_stall?seconds:int=30">
    <frame name="r1c3" src="test_stall?seconds:int=30">
  </frameset>
  <frameset cols="33%, 33%, 34%">
    <frame name="r2c1" src="test_stall?seconds:int=30">
    <frame name="r2c2" src="test_stall?seconds:int=30">
    <frame name="r2c3" src="test_stall?seconds:int=30">
  </frameset>
  <frameset cols="33%, 33%, 34%">
    <frame name="r3c1" src="test_stall?seconds:int=30">
    <frame name="r3c2" src="test_stall?seconds:int=30">
    <frame name="r3c3" src="test_stall?seconds:int=30">
  </frameset>
 </frameset>
 </html>

I can run this repeatedly, stopping it in mid-run, etc., all without causing any
problems that I can diagnose.  Anyone care to try it on their setup with IE5 as
the client?

Tres.
-- 
=========================================================
Tres Seaver  tseaver@digicool.com   tseaver@palladion.com