[Zope] Re: zope unresponsive

Paul Williams pwilliams at diamonddata.com
Tue Feb 27 13:29:28 EST 2007


No, we haven't done that yet.  That is something else we may try.



Marco Bizzarri wrote:
> On 2/27/07, Paul Williams 
> <pwilliams at diamonddata.com> wrote:
>>
>>
>> Tres Seaver wrote:
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > Paul Williams wrote:
>> >> Ok, here is what we have.  I did a netstat on both machines, client 
>> and
>> >> server.  The client sees and established connection and the server 
>> does
>> >> not.  In the server log there is a disconnect.  As far as hardware
>> >> between them, there is a switch (dell powerconnect 6024).  Web Server
>> >> Directors might get hold of it but there are no hops on traceroute.
>> >> Traceroute only shows the client machine and the server machine.
>> >>
>> >> So the client is just continuously polling the connection but getting
>> >> nothing back.
>> >
>> > That sounds like some weird kernel / networking problem to me:  I don't
>> > see how Zope could be able to keep calling 'select' on a socket after
>> > the other side has closed it.
>>
>> We agree.  This is a strange situation that none of us have seen before.
>>
>> However, we have until tomorrow to do something and replacing hardware
>> is not feasable.
>>
>>
>> >
>> > Is there any possibility that some kind of failover / IP takeover has
>> > happened, such that the storage server now running is not the same host
>> > / instance as the one to shich the clients originally connected?  Are
>> > you using LVS + heartbeat, or some kind of hardware load balancer to
>> > manage such redundancy?
>>
>> We do have Web Services Directors that do load balancing, but in this
>> particular case, the storage server is not setup for load balancing, I
>> am not aware of any features that make the zodb capable of clustering
>> except for replication services offered through zope.
>>
>> We are not sure whether the traffic is going to the Web Services
>> Directores or not.  Even if it is, there are thousands of settings and
>> there is no-one available that knows what to change.
>>
>>
>> The storage server is a simple nas server with a static ip address.
>>
>> >
>> >> What we are thinking about doing is changing the code in
>> >> zrpc/connection.py to close the connection in wait (line 638 zope
>> >> version 2.9.5) if the wait time gets too large or the poll has 
>> happened
>> >> too many times.
>> >>
>> >> We are great at plone development, but have very little backend zope
>> >> development.  Would someone please advise me as to whether this is 
>> going
>> >> to cause more problems?
>> >
>> > According to the log message you posted earlier in the thread, your
>> > appservers are spewing thousands of log messages from the connection's
>> > 'pending' method, although your deadlock debugger output shows the one
>> > thread blocked on 'select' inside of the connection's 'wait' method.
>> > There should be lots of log messages at TRACE level for the wait call,
>> > including a doubling / backoff of the delay value from 1 mx to 1 sec.
>> > Do you see those log messages, as well?
>>
>> These messages are there.  You can see the time doubling.  This is where
>> we were thinking of breaking the connection once it gets to a certain
>> point and make zope reconnect.
>>
>> This solves our hung connection problem, we think.  However, I am hoping
>> someone can let me know if I am breaking something else by doing this.
>>
>>
> 
> I don't remember if you already mentioned it. However: did you tried
> to monitor the traffic outgoing and incoming? I mean, setting some
> iptables rules and/or using something like tcpdump to monitor what is
> going on here?
> 
> Regards
> Marco
> 
> 



More information about the Zope mailing list