hi since upgrading zope from 2.5.1 + zeo1, i've been having frequent zope/zeo client hanging/not responding. the setup: we have zope-2.6.1-src + zeo-2.0.2 on linux with python 2.1.3 frequently, the zeo client hangs/not responding. output of ps axf is below[1] and the stupid.log[2] Also, nothing much in ZEO_EVENTS.log[3] telnet localhost 8080 GET / HTTP/1.0\n\n resulted in nothing. no time out, nothing running ./stop will only kill the parent process, i.e 17160 in the below example. the other processes can only be killed with a kill -9 17161. restarting the zeoclient will make it respond again. i've tried zeo2.1a1( iirc) and the behaviour is the same i used to run with persistent zeo cache (ZEO_CLIENT=12000) in the ENV, but running without is still the same. i've also tried runnin with -Z1, and -Z- also the same behaviour thanks for any help/pointers ------ ps axwf [1] # ps axf|grep py 17560 pts/0 D 0:00 \_ grep py 17160 ? SW 0:00 /usr/bin/python2.1 /usr/local/Zope-2.6.1-src/z2.py -X 17161 ? S 16:39 \_ /usr/bin/python2.1 /usr/local/Zope-2.6.1-src/z2.p 17162 ? S 0:00 \_ /usr/bin/python2.1 /usr/local/Zope-2.6.1-src/ 17165 ? R 26:09 \_ /usr/bin/python2.1 /usr/local/Zope-2.6.1- 17166 ? S 28:26 \_ /usr/bin/python2.1 /usr/local/Zope-2.6.1- 17167 ? S 26:50 \_ /usr/bin/python2.1 /usr/local/Zope-2.6.1- 17168 ? S 27:11 \_ /usr/bin/python2.1 /usr/local/Zope-2.6.1- [2]# tail var/stupid.log ------ 2003-03-21T10:24:53 BLATHER(-100) zrpc-conn:192.168.1.2:12000 calling Invalidate([('\x00\x00\x00\x00\x01\x0bQ\xec', ''), ('\x00\x00\x00\x00\... ------ 2003-03-21T10:25:20 BLATHER(-100) zrpc-conn:192.168.1.2:12000 calling Invalidate([('\x00\x00\x00\x00\x00\x00$\xae', ''), ('\x00\x00\x00\x00\... ------ 2003-03-21T10:25:28 BLATHER(-100) zrpc-conn:192.168.1.2:12000 calling Invalidate([('\x00\x00\x00\x00\x00\x1c\x9a\xd9', ''), ('\x00\x00\x00\x... ------ 2003-03-21T10:25:50 BLATHER(-100) zrpc-conn:192.168.1.2:12000 calling Invalidate([('\x00\x00\x00\x00\x01\x0bQ\xea', ''), ('\x00\x00\x00\x00\... ------ 2003-03-21T10:25:53 BLATHER(-100) zrpc-conn:192.168.1.2:12000 calling Invalidate([('\x00\x00\x00\x00\x01\x0bQ\xea', ''), ('\x00\x00\x00\x00\... [3]# tail -f var/ZEO_EVENTS.log ------ 2003-03-21T08:43:29 INFO(0) ZSS:3458/192.168.1.3:41449 Blocked transaction restarted. ------ 2003-03-21T08:47:38 INFO(0) ZSS:3458/192.168.1.10:52841 disconnected ------ 2003-03-21T08:49:19 INFO(0) ZSS:3458 new connection ('192.168.1.10', 50195): <ManagedServerConnection ('192.168.1.10', 50195)> ------ 2003-03-21T09:12:22 INFO(0) ZSS:3458/192.168.1.3:41449 Transaction blocked waiting for storage. Clients waiting: 1. ------ 2003-03-21T09:12:23 INFO(0) ZSS:3458/192.168.1.2:43083 Blocked transaction restarted. ------ 2003-03-21T10:34:18 INFO(0) ZSS:3458/192.168.1.10:50195 Transaction blocked waiting for storage. Clients waiting: 1. ------ 2003-03-21T10:34:19 INFO(0) ZSS:3458/192.168.1.2:43083 Blocked transaction restarted
Bakhtiar A Hamid wrote at 2003-3-21 11:08 +0800:
.... running ./stop will only kill the parent process, i.e 17160 in the below example. the other processes can only be killed with a kill -9 17161.
You find a corresponding problem report and patch in the collector: Zope's shutdown implementation (new in 2.6.1) is broken. It tries to close the database connections inside the signal handler. As it acquires a lock for this, a deadlock results when another thread helds the lock. That patch is very crude. Toby has a more general solution in a CVS branch. Search the archives for details. All the above is only about the 'cannot be killed other than by "kill -9"'. It does not address the hanging. We, too, see this behaviour -- but only under Solaris and not under Linux. We use ZEO 1 with Zope 2.6.1. Dieter
On Saturday 22 March 2003 04:30, Dieter Maurer wrote:
Bakhtiar A Hamid wrote at 2003-3-21 11:08 +0800:
.... running ./stop will only kill the parent process, i.e 17160 in the below example. the other processes can only be killed with a kill -9 17161.
You find a corresponding problem report and patch in the collector:
Zope's shutdown implementation (new in 2.6.1) is broken. It tries to close the database connections inside the signal handler. As it acquires a lock for this, a deadlock results when another thread helds the lock.
That patch is very crude.
thanks dieter. but the patch in the collector doesn't patch neatly fot the ClientStorage.py i haven't looked at it yet, and am not sure whether i can patch it by hand.
Toby has a more general solution in a CVS branch.
i have checked out zope2_6 branch, and hope that toby's solution is merged already. if things still go wrong, i'll try that. if things still go wrong, i'd have to revert to 2.5.1, unfortunately
Search the archives for details.
All the above is only about the 'cannot be killed other than by "kill -9"'. It does not address the hanging.
We, too, see this behaviour -- but only under Solaris and not under Linux. We use ZEO 1 with Zope 2.6.1.
hope to hear from you when you solve this.
Dieter
thanks
Bakhtiar A Hamid wrote at 2003-3-25 09:08 +0800:
....
It does not address the hanging.
We, too, see this behaviour -- but only under Solaris and not under Linux. We use ZEO 1 with Zope 2.6.1.
hope to hear from you when you solve this.
We moved to Linux... Unless the problem occurs there, too, I will not investigate it further. Up to now, there are no signs for this. Dieter
participants (2)
-
Bakhtiar A Hamid -
Dieter Maurer