New subject: [Zope-dev] Re: [Zope] highly available Zope thread; our hanging problem

6 Jun 2000

      Hi,

I'd like to comment on this, and summarise some references below. Much of
this discussion took place on the zope-dev list (see references), so I'm
cc'ing the zope-dev list. You might also wish to add to the Wiki:
http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/.
...
-----Original Message-----
From: Brian Takashi Hooper [mailto:brian@garage.co.jp]
Sent: 06 June 2000 12:11
To: zope@zope.org
Subject: [Zope] highly available Zope thread; our hanging problem
Hi all -
I was looking at the discussion from April that was posted on the
HighlyAvailableZope Wiki about problems with Zope hanging; we had a
similar situation here at Digital Garage which seemed to be alleviated
by changing the zombie_timeout to be really short (like, 1 minute). 
Before changing the zombie_timeout, the server would periodically hang
and not give any responses to requests, sometimes recovering after a
short time.
Some questions at this point:
1. Were you running with multiple threads, and if so, how many?

2. If you were using multiple threads, would *all* the threads periodically
hang, or was the hanging isolated to a single thread at a time?

3. Could you possibly comment on the operating system used?

4. Which zombie_timeout did you twiddle -- the one in the zhttp_channel in
ZServer.py, or that in http_channel in medusa/http_server.py?
...
At this point, I don't have anything more than just an empirical
observation - changing this parameter seemed to help our server.  Has
anyone else noticed anything similar, or can explain this observation?
Concerning the zombie_timeout suggestion, here are some references when I
posed the question of whether reducing the value would be beneficial:

Amos Lattier wrote in
http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html:
...
The ZServer zombie stuff is to get rid of zombie client 
connections, not zombie publishing threads. These are quite 
different beasts.
Michel Pelletier wrote in 
http://lists.zope.org/pipermail/zope-dev/2000-April/004229.html:
...
What the Zombie timeout means is that after a publishing thread gets
done answering a request, the socket may not go away.  This many for a a
number of reasons, the client 'hung' and is not 'putting down the phone
after the converstation is over' (so to speak) or network troubles may
prevent the connection from closing properly.  This means that there is
a 'zombie' connection laying around.  This zombie will probably end up
going away on its own, but if not, ZServer will kill it after a period
of time.
The only reasorce laying around during the life of a Zombie is an tiny
little unused open socket, the Mack truck of a Zope thread that served
the request for the zombie socket does not 'hang' for that entire period
of time, but goes on after it has completed the request to serve other
requests.
Amos is correct in that these problems are almost always at the
Application level, and not at the ZServer level.  The fact that Pavlos
can prevent hanging by inserting a print statement in the asyncore loop[*]
is suspicious, but we do not have enough information yet to point
fingers anywhere.
[* references http://lists.zope.org/pipermail/zope/2000-April/023697.html]

I'd be _very_ interested in hearing more on this! Our Zope installation has
been pretty stable of late (isn't it strange that, when you want to find out
what's causing things to break, they play nice?), with uptime of
thirty-something days, but I'm still very keen to get to the bottom of this,
since I don't believe it was some ephemeral problem.

hth, and thanks again!

-- Marcus

RE: [Zope] highly available Zope thread; our hanging problem

Marcus Collins

Brian Takashi Hooper

tags

participants (2)