RE: [Zope] Re: Zope hanging (poss. threads-related)

12 Apr 2000

      ...
-----Original Message-----
From: Michel Pelletier [mailto:michel@digicool.com]
Sent: 12 April 2000 02:29
To: Marcus Collins
Cc: 'zope@zope.org'
Subject: Re: [Zope] Re: Zope hanging (poss. threads-related)
<snip>
...
I suspect this problem *might* be unrelated to the threadlock
discussed so far, in the case of the reported lock, 2 or more
threads cause instability.  In your case, you report 4 is stable.
Our live server, running Zope 2.1.3 with 4 threads, has just had a
thread hang. There has been no management activity on this server
since last it was restarted about four hours ago. At the time of the
hanging, a frameset was being requested (cf. my original post).

A moment ago, I thought this might have something to do with the way ZServer
handles pipelined requests, but the problem exists with PCGI as well as
HTTP...

I haven't yet come close to grokking ZServer, but my suspicions arose
because I'd seen this hanging occur only on frameset pages, for which the
client would quite reasonably send multiple requests on the same 
channel before receiving a response to the first. As our images are
served directly by apache, framesets would be the only initiators of
pipelined requests. So it's not protocol-specific, but this must be some
kind of deadlock problem, or a thread stomping over another request, but
beyond the level of the protocol being used. I'll be focussing on
ZServer/medusa/* now...

As mentioned, I've been testing Zope 2.1.6 under similar conditions,
but with 20 threads. So far, I haven't been able to cause it to hang
on the frameset pages. However, I have caused a single thread to hang
by repeatedly refreshing a page (the method concerned happens to call
an external method, if that adds relevant info.)

I've hacked z2.py on both the 2.1.3 and 2.1.6 installations so that
they instantiate a DebugLogger, and in both I've managed to catch
several requests that did not end using Amos Latteier's analyser
script. In all cases, the requests got only as far as the received input (I)
stage. 

This brings up another point, which I came across in the Collector as well
(http://classic.zope.org:8080/Collector/871/view) --
how does ZServer handle zombie requests? I noticed in HTTPServer.py that
there is a "zombie_timeout" of 100 minutes (btw, there's also a
zombie_timeout of 30 minutes in medusa/http_server.py) -- is the thread held
up with the zombie request for the duration of this timeout and, if so, is
it not conceivable that a number of zombies could deplete the threadpool to
the extent that no further threads are available? I guess it depends on
exactly what "hung" means in this case... Also, is it possible to track
thread usage somewhere?

I realise that all of this doesn't amount to much to go on... However, the
problems are reproducible on my machine, and exacerbated as the
NUMBER_OF_THREADS increases, so if anyone can guide me in where to look,
I'll hopefully be able to apply that in testing this machine, and together
we can get to the bottom of this (the wonder of OSS!).

Regards,

-- Marcus