[Zope-dev] RE: [Zope] highly available Zope thread; our hanging problem

Marcus Collins mcollins@sunesi.com
Tue, 6 Jun 2000 15:19:29 +0200


Hi,

I'd like to comment on this, and summarise some references below. Much of
this discussion took place on the zope-dev list (see references), so I'm
cc'ing the zope-dev list. You might also wish to add to the Wiki:
http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/.

> -----Original Message-----
> From: Brian Takashi Hooper [mailto:brian@garage.co.jp]
> Sent: 06 June 2000 12:11
> To: zope@zope.org
> Subject: [Zope] highly available Zope thread; our hanging problem
> 
> Hi all -
> 
> I was looking at the discussion from April that was posted on the
> HighlyAvailableZope Wiki about problems with Zope hanging; we had a
> similar situation here at Digital Garage which seemed to be alleviated
> by changing the zombie_timeout to be really short (like, 1 minute). 
> Before changing the zombie_timeout, the server would periodically hang
> and not give any responses to requests, sometimes recovering after a
> short time.

Some questions at this point:
1. Were you running with multiple threads, and if so, how many?

2. If you were using multiple threads, would *all* the threads periodically
hang, or was the hanging isolated to a single thread at a time?

3. Could you possibly comment on the operating system used?

4. Which zombie_timeout did you twiddle -- the one in the zhttp_channel in
ZServer.py, or that in http_channel in medusa/http_server.py?

> At this point, I don't have anything more than just an empirical
> observation - changing this parameter seemed to help our server.  Has
> anyone else noticed anything similar, or can explain this observation?

Concerning the zombie_timeout suggestion, here are some references when I
posed the question of whether reducing the value would be beneficial:

Amos Lattier wrote in
http://lists.zope.org/pipermail/zope-dev/2000-April/004194.html:
> The ZServer zombie stuff is to get rid of zombie client 
> connections, not zombie publishing threads. These are quite 
> different beasts.

Michel Pelletier wrote in 
http://lists.zope.org/pipermail/zope-dev/2000-April/004229.html:
> What the Zombie timeout means is that after a publishing thread gets
> done answering a request, the socket may not go away.  This many for a a
> number of reasons, the client 'hung' and is not 'putting down the phone
> after the converstation is over' (so to speak) or network troubles may
> prevent the connection from closing properly.  This means that there is
> a 'zombie' connection laying around.  This zombie will probably end up
> going away on its own, but if not, ZServer will kill it after a period
> of time.
> 
> The only reasorce laying around during the life of a Zombie is an tiny
> little unused open socket, the Mack truck of a Zope thread that served
> the request for the zombie socket does not 'hang' for that entire period
> of time, but goes on after it has completed the request to serve other
> requests.
> 
> Amos is correct in that these problems are almost always at the
> Application level, and not at the ZServer level.  The fact that Pavlos
> can prevent hanging by inserting a print statement in the asyncore loop[*]
> is suspicious, but we do not have enough information yet to point
> fingers anywhere.

[* references http://lists.zope.org/pipermail/zope/2000-April/023697.html]

I'd be _very_ interested in hearing more on this! Our Zope installation has
been pretty stable of late (isn't it strange that, when you want to find out
what's causing things to break, they play nice?), with uptime of
thirty-something days, but I'm still very keen to get to the bottom of this,
since I don't believe it was some ephemeral problem.

hth, and thanks again!

-- Marcus