I'm convinced there's some deep, dark, timing problem, and that it's thread-related... This morning, I had a thread hang, again on Zope 2.1.3 running with four threads. I was watching for unfinished requests at the time, and caught this twenty minutes after it occurred. I tried viewing a frameset page on the site. No problem. Then I tried launching /manage, and suddenly *all* the threads hung. This is the first time all the threads have hung. It's also the first time I've managed to catch an unfinished request and try viewing framesets within half an hour, which leads me to believe that, previously, ZServer was doing a good job of cleaning up the hung zombie threads (recall the zombie_timeout of 30 minutes). I'm actually wondering if reducing that zombie_timeout (and maintenance_interval in medusa/http_server.py) would go anywhere towards alleviating this problem as a temporary measure. Would there be any reasons not to try this? I've added quite a lot of DebugLogger stuff to ZServer/PCGIServer.py, and modified the log analyser accordingly, in the hopes of nailing this sucker: def send_response(self): # create an output pipe by passing request to ZPublisher, # and requesting a callback of self.log with the module # name and PATH_INFO as an argument. self.done=1 # MC 2000-04-13 additional logging DebugLogger.log('X', id(self), 'send_response: before PCGIResponse') response=PCGIResponse(stdout=PCGIPipe(self), stderr=StringIO()) # MC 2000-04-13 additional logging DebugLogger.log('X', id(self), 'send_response: before HTTPRequest') request=HTTPRequest(self.data, self.env, response) # MC 2000-04-13 additional logging DebugLogger.log('X', id(self), 'send_response: before handle') handle(self.server.module, request, response) Everything before the call to handle() is 100%. Sometimes, however, we don't get from handle() to the next stage. This is on Zope 2.1.6, which I've been running with up to 100 threads, although I unfortunately can't excercise that many! In fact, it has proven quite a mission to get Zope to hang, maybe because of the increased latency in serving requests due to the additional logging. I've added this logging to the Zope 2.1.3 serving the live site, and will report my findings as soon as something untoward occurs. Maybe others who are experiencing hanging would also be able to do some extra logging and report the results [now, there, I see Wiki would be really useful!]. In the meantime, any suggestions as to where to go next will be keenly acted on! Thanks all! -- Marcus