On Thu, Mar 30, 2006 at 02:32:58AM +1000, Alan Milligan wrote:
I managed to get a DeadlockDebugger trace on this thing, it made very interesting reading: (snip) File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 781, in loadEx return data, tid, ver
*every* thread was block-waiting on zeo (from a wide range of different Zope/Plone types)! It looks to me like Apache has timed out, clearing down it's end, Zope however is still having to wait for zeo which is completely borked.
I've consequently ditched zeo and everything is again well-behaved.
Is your zeo server on a separate box? Is there a firewall between them? The *only* time I've ever had problems like that was in the following scenario: * firewall between zope and zeo * minimal traffic at times (it was a secondary system, most of its usage was when our primary data center was down for maintenance) * firewall was of an evil type that tears down "unused" connections without either end being able to know it happened In this scenario, after suitably long period of no traffic between Zope and Zeo, the firewall would disconnect them but they would still think they were connected, and we would get a problem like yours. Dieter Maurer observed the same thing and gave me the hint that this might be the problem. Implementing his suggested "keepalive" product was less trouble than arguing with the firewall administrators. http://aspn.activestate.com/ASPN/Mail/Message/zope-list/2918584 I used something very close to that, I believe I just saved it as Products/ZeoKeepalive/__init__.py. (I've changed jobs so I'm going by memory.) -- Paul Winkler http://www.slinkp.com