[Zope] Re: Zope hanging (poss. threads-related)

Wed, 12 Apr 2000 10:50:38 -0700

Marcus Collins wrote:
> 
> Our live server, running Zope 2.1.3 with 4 threads, has just had a
> thread hang. There has been no management activity on this server
> since last it was restarted about four hours ago. At the time of the
> hanging, a frameset was being requested (cf. my original post).

The /manage screens also have the potential for the client to send
multiple request.  

> A moment ago, I thought this might have something to do with the way ZServer
> handles pipelined requests, but the problem exists with PCGI as well as
> HTTP...

And the problem also exists with FastCGI.

> I haven't yet come close to grokking ZServer, but my suspicions arose
> because I'd seen this hanging occur only on frameset pages, for which the
> client would quite reasonably send multiple requests on the same
> channel before receiving a response to the first. As our images are
> served directly by apache, framesets would be the only initiators of
> pipelined requests. So it's not protocol-specific, but this must be some
> kind of deadlock problem, or a thread stomping over another request, but
> beyond the level of the protocol being used. I'll be focussing on
> ZServer/medusa/* now...

> I've hacked z2.py on both the 2.1.3 and 2.1.6 installations so that
> they instantiate a DebugLogger, and in both I've managed to catch
> several requests that did not end using Amos Latteier's analyser
> script. In all cases, the requests got only as far as the received input (I)
> stage.

I also hacked z2.py as well as FCGIServer.py to get Amos's analyser
script to work.  I found no specfic method, request or mix of requests
consistently incomplete.  It almost appeared the larger the resulting
page the greater the likelyhood of it being incomplete, which would make
sense from a purly statistical perspective.

> This brings up another point, which I came across in the Collector as well
> (http://classic.zope.org:8080/Collector/871/view) --
> how does ZServer handle zombie requests? I noticed in HTTPServer.py that
> there is a "zombie_timeout" of 100 minutes (btw, there's also a
> zombie_timeout of 30 minutes in medusa/http_server.py) -- is the thread held
> up with the zombie request for the duration of this timeout and, if so, is
> it not conceivable that a number of zombies could deplete the threadpool to
> the extent that no further threads are available? I guess it depends on
> exactly what "hung" means in this case... Also, is it possible to track
> thread usage somewhere?

I could not find any zombie_timeout in FCGIServer.  But we use a mix of
HTTPServer and FCGIServer, and primarily FCGIServer.

> I realise that all of this doesn't amount to much to go on... However, the
> problems are reproducible on my machine, and exacerbated as the
> NUMBER_OF_THREADS increases, so if anyone can guide me in where to look,
> I'll hopefully be able to apply that in testing this machine, and together
> we can get to the bottom of this (the wonder of OSS!).

Let me know if there is anyway I can assist in this process.  Our boxes
are in production and that limits how much experimenting I can do, but
this week I should have a test box ready to roll.  I kow what you mean
by not much to go on...  I've been following every lead I can find to no
avail.

-- 
-------------------------------
tonyr@ep.newtimes.com
Director of Web Technology
New Times, Inc.
-------------------------------