-----Original Message----- From: Michel Pelletier [mailto:michel@digicool.com] Sent: 12 April 2000 02:29 To: Marcus Collins Cc: 'zope@zope.org' Subject: Re: [Zope] Re: Zope hanging (poss. threads-related)
<snip>
I suspect this problem *might* be unrelated to the threadlock discussed so far, in the case of the reported lock, 2 or more threads cause instability. In your case, you report 4 is stable.
Our live server, running Zope 2.1.3 with 4 threads, has just had a thread hang. There has been no management activity on this server since last it was restarted about four hours ago. At the time of the hanging, a frameset was being requested (cf. my original post). A moment ago, I thought this might have something to do with the way ZServer handles pipelined requests, but the problem exists with PCGI as well as HTTP... I haven't yet come close to grokking ZServer, but my suspicions arose because I'd seen this hanging occur only on frameset pages, for which the client would quite reasonably send multiple requests on the same channel before receiving a response to the first. As our images are served directly by apache, framesets would be the only initiators of pipelined requests. So it's not protocol-specific, but this must be some kind of deadlock problem, or a thread stomping over another request, but beyond the level of the protocol being used. I'll be focussing on ZServer/medusa/* now... As mentioned, I've been testing Zope 2.1.6 under similar conditions, but with 20 threads. So far, I haven't been able to cause it to hang on the frameset pages. However, I have caused a single thread to hang by repeatedly refreshing a page (the method concerned happens to call an external method, if that adds relevant info.) I've hacked z2.py on both the 2.1.3 and 2.1.6 installations so that they instantiate a DebugLogger, and in both I've managed to catch several requests that did not end using Amos Latteier's analyser script. In all cases, the requests got only as far as the received input (I) stage. This brings up another point, which I came across in the Collector as well (http://classic.zope.org:8080/Collector/871/view) -- how does ZServer handle zombie requests? I noticed in HTTPServer.py that there is a "zombie_timeout" of 100 minutes (btw, there's also a zombie_timeout of 30 minutes in medusa/http_server.py) -- is the thread held up with the zombie request for the duration of this timeout and, if so, is it not conceivable that a number of zombies could deplete the threadpool to the extent that no further threads are available? I guess it depends on exactly what "hung" means in this case... Also, is it possible to track thread usage somewhere? I realise that all of this doesn't amount to much to go on... However, the problems are reproducible on my machine, and exacerbated as the NUMBER_OF_THREADS increases, so if anyone can guide me in where to look, I'll hopefully be able to apply that in testing this machine, and together we can get to the bottom of this (the wonder of OSS!). Regards, -- Marcus