I'm actually wondering if reducing that zombie_timeout (and maintenance_interval in medusa/http_server.py) would go anywhere towards alleviating this problem as a temporary measure. Would there be any reasons not to try this?
The ZServer zombie stuff is to get rid of zombie client connections, not zombie publishing threads. These are quite different beasts.
Everything before the call to handle() is 100%. Sometimes, however, we don't get from handle() to the next stage. This is on Zope 2.1.6, which I've been running with up to 100 threads, although I unfortunately can't excercise that many!
In general there is little reason to have so many publishing threads. You almost never need that many unless you have a bunch of requests that can take a *long* time. If, on the other hand, you are trying to provoke some kind of thread contention issue I advise you to publish resources that take a long time to return. That way you can easily pile up as many publishing theads as you want to.
I've added this logging to the Zope 2.1.3 serving the live site, and will report my findings as soon as something untoward occurs. Maybe others who are experiencing hanging would also be able to do some extra logging and report the results [now, there, I see Wiki would be really useful!].
In the meantime, any suggestions as to where to go next will be keenly acted on!
In my experience when a Zope publishing thread hangs its almost always a problem with the published resource. Maybe there's something that puts Zope in a loop that never exits, or maybe there's some DA weirdness that hangs the thead. My advice is to try and identify which requests hang using debug logging and examine the resources that those requests use. -Amos -- Amos Latteier mailto:amos@digicool.com Digital Creations http://www.digicool.com