zope unresponsive
I have posted this several times, but have not until now been able to get DeadlockDebugger installed. I see several people have had this problem, but no-one has posted a solution. zope 2.9.5 + zeo pythonm2.4.3 Red Hat RHEL 4 Plone 2.5.1 Our zeo clients hang intermittently. We have no way of reproducing the problem, but it occurs daily. The client hangs and a restart seems to fix the problem. In the event log with tracing on we get Trace zeo.zrpc.Connection(C) wait(16697) {server:8100} pending, async=0 There are hundreds to thousands of these until the server is restarted. In the zeo log we get Error caught in asyncor asyncore.py error:(110,'Connection timed out') We have been trying to track this down and have had no luck. Does anyone have any suggestions? Below is our deadlock debugger output Threads traceback dump at 2007-02-23 15:26:50 Thread -1269564496 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content///training): File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubC ore/ZServerPublisher.py", line 23, in __init__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 395, in publish_module File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 196, in publish_module_standard File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/Pat chStringIO.py", line 34, in new_publish x = Publish.old_publish(request, module_name, after_list, debug) File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 115, in publish File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/m apply.py", line 88, in mapply File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 41, in call_object File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Shared/DC/Sc ripts/Bindings.py", line 311, in __call__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Shared/DC/Sc ripts/Bindings.py", line 348, in _bindAndExec File "/apps1/zope2.9.5/navo_instance/Products/CMFCore/FSPageTemplate.py", line 195, in _exec result = self.pt_render(extra_context=bound_names) File "/apps1/zope2.9.5/navo_instance/Products/CacheSetup/patch_cmf.py", line 38, in FSPT_pt_render result = FSPageTemplate.inheritedAttribute('pt_render')( File "/apps1/zope2.9.5/navo_instance/Products/CacheSetup/patch_cmf.py", line 92, in PT_pt_render tal=not source, strictinsert=0)() File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 238, in __call__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 281, in interpret File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 749, in do_useMacro File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 281, in interpret File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 457, in do_optTag_tal File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 442, in do_optTag File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 437, in no_tag File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 281, in interpret File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 749, in do_useMacro File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 281, in interpret File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInter preter.py", line 507, in do_setLocal_tal File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/Pag eTemplates/TALES.py", line 221, in evaluate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/Pag eTemplates/Expressions.py", line 185, in __call__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/Pag eTemplates/Expressions.py", line 180, in _eval File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/Pag eTemplates/Expressions.py", line 85, in render File "/apps1/zope2.9.5/navo_instance/Products/CMFPlone/browser/plone.py", line 66, in globalize self._initializeData(options=options) File "/apps1/zope2.9.5/navo_instance/Products/CMFPlone/browser/plone.py", line 147, in _initializeData self._data['language'] = self.request.get('language', None) or \ File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/ClassGen.py", line 58, in generatedAccessor return schema[name].get(self, **kw) File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/Field.py", line 802, in get value = ObjectField.get(self, instance, **kwargs) File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/Field.py", line 671, in get return self.getStorage(instance).get(self.getName(), instance, **kwargs) File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/Storage/__init__.py" , line 175, in get value = base._md[name] File "/var/tmp/python2.4-2.4.3-root/apps1/python/lib/python2.4/UserDict.py", line 17, in __getitem__ def __getitem__(self, key): return self.data[key] File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 732, in setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 786, in _setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 604, in setGhostState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 597, in getState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 471, in _persistent_load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 537, in load_oid File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 201, in get File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 746, in load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 760, in loadEx Thread -1290544208 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content//nav): File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubC ore/ZServerPublisher.py", line 23, in __init__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 395, in publish_module File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 196, in publish_module_standard File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/Pat chStringIO.py", line 34, in new_publish x = Publish.old_publish(request, module_name, after_list, debug) File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 106, in publish File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/B aseRequest.py", line 366, in traverse File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 732, in setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 786, in _setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 604, in setGhostState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 597, in getState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 471, in _persistent_load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 537, in load_oid File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 201, in get File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 746, in load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 760, in loadEx Thread -1246884944 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content/carrier.jpg): File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubC ore/ZServerPublisher.py", line 23, in __init__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 395, in publish_module File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 196, in publish_module_standard File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/Pat chStringIO.py", line 34, in new_publish x = Publish.old_publish(request, module_name, after_list, debug) File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 115, in publish File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/m apply.py", line 88, in mapply File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 41, in call_object File "/apps1/zope2.9.5/navo_instance/Products/ATContentTypes/content/base.py" , line 414, in index_html if data: File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 732, in setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 786, in _setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 604, in setGhostState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 597, in getState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 471, in _persistent_load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 537, in load_oid File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 201, in get File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 746, in load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 769, in loadEx File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ServerSt ub.py", line 192, in loadEx File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/zrpc/con nection.py", line 531, in call File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/zrpc/con nection.py", line 638, in wait File "/var/tmp/python2.4-2.4.3-root/apps1/python/lib/python2.4/asyncore.py", line 122, in poll r, w, e = select.select(r, w, e, timeout) Thread -1280054352 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content/): File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubC ore/ZServerPublisher.py", line 23, in __init__ File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 395, in publish_module File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 196, in publish_module_standard File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/Pat chStringIO.py", line 34, in new_publish x = Publish.old_publish(request, module_name, after_list, debug) File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/P ublish.py", line 106, in publish File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/B aseRequest.py", line 366, in traverse File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 732, in setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 786, in _setstate File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 604, in setGhostState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 597, in getState File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 471, in _persistent_load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/seriali ze.py", line 537, in load_oid File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connect ion.py", line 201, in get File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 746, in load File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientSt orage.py", line 760, in loadEx End of dump Thank you, Paul Williams
On 2/24/07, Paul Williams <PWilliams@diamonddata.com> wrote:
I have posted this several times, but have not until now been able to get DeadlockDebugger installed. I see several people have had this problem, but no-one has posted a solution.
zope 2.9.5 + zeo pythonm2.4.3 Red Hat RHEL 4 Plone 2.5.1
Our zeo clients hang intermittently. We have no way of reproducing the problem, but it occurs daily. The client hangs and a restart seems to fix the problem.
I don't know if this even relevant but I also have had Zope go into lala land and struggled for a while trying to figure out what was going on. Like your situation it happened intermittently and a reboot recovered (and sometimes seemed to recover on its own). And like you I tried various methods to spot the problem including installing DeadlockDebugger. This was in various versions of Zope running on Win2003. In the end I found it was a page/browser at the root of the problem. It turns it is very easy to mount an inadvertent DOS attack on Zope that kills it. In my case I had a javascript in a periodic timer event handler (with a fairly short period) that did a page reload and under certain circumstances didn't clear the timer causing a timer event driven endless loop.
On 2/24/07, Paul Williams <PWilliams@diamonddata.com> wrote:
I have posted this several times, but have not until now been able to get DeadlockDebugger installed. I see several people have had this problem, but no-one has posted a solution.
I don't know if that can be the case, but is there a firewall between your Zope and your ZEO? I remember some discussions about firewalls between Zope and ZEO which could cause problems... Regards Marco -- Marco Bizzarri http://iliveinpisa.blogspot.com/
Ok, here is what we have. I did a netstat on both machines, client and server. The client sees and established connection and the server does not. In the server log there is a disconnect. As far as hardware between them, there is a switch (dell powerconnect 6024). Web Server Directors might get hold of it but there are no hops on traceroute. Traceroute only shows the client machine and the server machine. So the client is just continuously polling the connection but getting nothing back. What we are thinking about doing is changing the code in zrpc/connection.py to close the connection in wait (line 638 zope version 2.9.5) if the wait time gets too large or the poll has happened too many times. We are great at plone development, but have very little backend zope development. Would someone please advise me as to whether this is going to cause more problems? Thanks, Paul Williams Paul Williams wrote:
I have posted this several times, but have not until now been able to get DeadlockDebugger installed. I see several people have had this problem, but no-one has posted a solution.
zope 2.9.5 + zeo
pythonm2.4.3
Red Hat RHEL 4
Plone 2.5.1
Our zeo clients hang intermittently. We have no way of reproducing the
problem, but it occurs daily. The client hangs and a restart seems to fix the
problem.
In the event log with tracing on we get
Trace zeo.zrpc.Connection(C) wait(16697) {server:8100} pending, async=0
There are hundreds to thousands of these until the server is restarted.
In the zeo log we get
Error caught in asyncor asyncore.py
error:(110,'Connection timed out')
We have been trying to track this down and have had no luck. Does anyone have
any suggestions? Below is our deadlock debugger output
Threads traceback dump at 2007-02-23 15:26:50
Thread -1269564496 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content///training):
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 395, in publish_module
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 196, in publish_module_standard
File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/PatchStringIO.py", line 34, in new_publish
x = Publish.old_publish(request, module_name, after_list, debug)
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 115, in publish
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/mapply.py", line 88, in mapply
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 41, in call_object
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Shared/DC/Scripts/Bindings.py", line 311, in __call__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Shared/DC/Scripts/Bindings.py", line 348, in _bindAndExec
File "/apps1/zope2.9.5/navo_instance/Products/CMFCore/FSPageTemplate.py", line 195, in _exec
result = self.pt_render(extra_context=bound_names)
File "/apps1/zope2.9.5/navo_instance/Products/CacheSetup/patch_cmf.py", line 38, in FSPT_pt_render
result = FSPageTemplate.inheritedAttribute('pt_render')(
File "/apps1/zope2.9.5/navo_instance/Products/CacheSetup/patch_cmf.py", line 92, in PT_pt_render
tal=not source, strictinsert=0)()
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 238, in __call__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 281, in interpret
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 749, in do_useMacro
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 281, in interpret
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 457, in do_optTag_tal
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 442, in do_optTag
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 437, in no_tag
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 281, in interpret
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 749, in do_useMacro
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 281, in interpret
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/TAL/TALInterpreter.py", line 507, in do_setLocal_tal
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/PageTemplates/TALES.py", line 221, in evaluate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/PageTemplates/Expressions.py", line 185, in __call__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/PageTemplates/Expressions.py", line 180, in _eval
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/Products/PageTemplates/Expressions.py", line 85, in render
File "/apps1/zope2.9.5/navo_instance/Products/CMFPlone/browser/plone.py", line 66, in globalize
self._initializeData(options=options)
File "/apps1/zope2.9.5/navo_instance/Products/CMFPlone/browser/plone.py", line 147, in _initializeData
self._data['language'] = self.request.get('language', None) or \
File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/ClassGen.py", line 58, in generatedAccessor
return schema[name].get(self, **kw)
File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/Field.py", line 802, in get
value = ObjectField.get(self, instance, **kwargs)
File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/Field.py", line 671, in get
return self.getStorage(instance).get(self.getName(), instance, **kwargs)
File "/apps1/zope2.9.5/navo_instance/Products/Archetypes/Storage/__init__.py", line 175, in get
value = base._md[name]
File "/var/tmp/python2.4-2.4.3-root/apps1/python/lib/python2.4/UserDict.py", line 17, in __getitem__
def __getitem__(self, key): return self.data[key]
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 732, in setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 786, in _setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 604, in setGhostState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 597, in getState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 471, in _persistent_load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 537, in load_oid
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 201, in get
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 746, in load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 760, in loadEx
Thread -1290544208 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content//nav):
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 395, in publish_module
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 196, in publish_module_standard
File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/PatchStringIO.py", line 34, in new_publish
x = Publish.old_publish(request, module_name, after_list, debug)
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 106, in publish
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/BaseRequest.py", line 366, in traverse
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 732, in setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 786, in _setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 604, in setGhostState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 597, in getState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 471, in _persistent_load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 537, in load_oid
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 201, in get
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 746, in load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 760, in loadEx
Thread -1246884944 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content/carrier.jpg):
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 395, in publish_module
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 196, in publish_module_standard
File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/PatchStringIO.py", line 34, in new_publish
x = Publish.old_publish(request, module_name, after_list, debug)
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 115, in publish
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/mapply.py", line 88, in mapply
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 41, in call_object
File "/apps1/zope2.9.5/navo_instance/Products/ATContentTypes/content/base.py", line 414, in index_html
if data:
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 732, in setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 786, in _setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 604, in setGhostState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 597, in getState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 471, in _persistent_load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 537, in load_oid
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 201, in get
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 746, in load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 769, in loadEx
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ServerStub.py", line 192, in loadEx
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/zrpc/connection.py", line 531, in call
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/zrpc/connection.py", line 638, in wait
File "/var/tmp/python2.4-2.4.3-root/apps1/python/lib/python2.4/asyncore.py", line 122, in poll
r, w, e = select.select(r, w, e, timeout)
Thread -1280054352 (GET /VirtualHostBase/https/soawds:443/VirtualHostRoot/Content/):
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 395, in publish_module
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 196, in publish_module_standard
File "/apps1/zope2.9.5/navo_instance/Products/PlacelessTranslationService/PatchStringIO.py", line 34, in new_publish
x = Publish.old_publish(request, module_name, after_list, debug)
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/Publish.py", line 106, in publish
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZPublisher/BaseRequest.py", line 366, in traverse
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 732, in setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 786, in _setstate
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 604, in setGhostState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 597, in getState
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 471, in _persistent_load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/serialize.py", line 537, in load_oid
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZODB/Connection.py", line 201, in get
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 746, in load
File "/var/tmp/Zope-2.9.5-1-buildroot/apps1/zope2.9.5/lib/python/ZEO/ClientStorage.py", line 760, in loadEx
End of dump
Thank you,
Paul Williams
------------------------------------------------------------------------
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Paul Williams wrote:
Ok, here is what we have. I did a netstat on both machines, client and server. The client sees and established connection and the server does not. In the server log there is a disconnect. As far as hardware between them, there is a switch (dell powerconnect 6024). Web Server Directors might get hold of it but there are no hops on traceroute. Traceroute only shows the client machine and the server machine.
So the client is just continuously polling the connection but getting nothing back.
That sounds like some weird kernel / networking problem to me: I don't see how Zope could be able to keep calling 'select' on a socket after the other side has closed it. Is there any possibility that some kind of failover / IP takeover has happened, such that the storage server now running is not the same host / instance as the one to shich the clients originally connected? Are you using LVS + heartbeat, or some kind of hardware load balancer to manage such redundancy?
What we are thinking about doing is changing the code in zrpc/connection.py to close the connection in wait (line 638 zope version 2.9.5) if the wait time gets too large or the poll has happened too many times.
We are great at plone development, but have very little backend zope development. Would someone please advise me as to whether this is going to cause more problems?
According to the log message you posted earlier in the thread, your appservers are spewing thousands of log messages from the connection's 'pending' method, although your deadlock debugger output shows the one thread blocked on 'select' inside of the connection's 'wait' method. There should be lots of log messages at TRACE level for the wait call, including a doubling / backoff of the delay value from 1 mx to 1 sec. Do you see those log messages, as well? Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF5Dvr+gerLs4ltQ4RAm/HAKCUN5WboOxVGeB11GhEfgYQ3wos3QCdH0TW DbcpXiMPlcQYyx0gewPFMLI= =9A/a -----END PGP SIGNATURE-----
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Paul Williams wrote:
Ok, here is what we have. I did a netstat on both machines, client and server. The client sees and established connection and the server does not. In the server log there is a disconnect. As far as hardware between them, there is a switch (dell powerconnect 6024). Web Server Directors might get hold of it but there are no hops on traceroute. Traceroute only shows the client machine and the server machine.
So the client is just continuously polling the connection but getting nothing back.
That sounds like some weird kernel / networking problem to me: I don't see how Zope could be able to keep calling 'select' on a socket after the other side has closed it.
We agree. This is a strange situation that none of us have seen before. However, we have until tomorrow to do something and replacing hardware is not feasable.
Is there any possibility that some kind of failover / IP takeover has happened, such that the storage server now running is not the same host / instance as the one to shich the clients originally connected? Are you using LVS + heartbeat, or some kind of hardware load balancer to manage such redundancy?
We do have Web Services Directors that do load balancing, but in this particular case, the storage server is not setup for load balancing, I am not aware of any features that make the zodb capable of clustering except for replication services offered through zope. We are not sure whether the traffic is going to the Web Services Directores or not. Even if it is, there are thousands of settings and there is no-one available that knows what to change. The storage server is a simple nas server with a static ip address.
What we are thinking about doing is changing the code in zrpc/connection.py to close the connection in wait (line 638 zope version 2.9.5) if the wait time gets too large or the poll has happened too many times.
We are great at plone development, but have very little backend zope development. Would someone please advise me as to whether this is going to cause more problems?
According to the log message you posted earlier in the thread, your appservers are spewing thousands of log messages from the connection's 'pending' method, although your deadlock debugger output shows the one thread blocked on 'select' inside of the connection's 'wait' method. There should be lots of log messages at TRACE level for the wait call, including a doubling / backoff of the delay value from 1 mx to 1 sec. Do you see those log messages, as well?
These messages are there. You can see the time doubling. This is where we were thinking of breaking the connection once it gets to a certain point and make zope reconnect. This solves our hung connection problem, we think. However, I am hoping someone can let me know if I am breaking something else by doing this.
Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF5Dvr+gerLs4ltQ4RAm/HAKCUN5WboOxVGeB11GhEfgYQ3wos3QCdH0TW DbcpXiMPlcQYyx0gewPFMLI= =9A/a -----END PGP SIGNATURE-----
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On 2/27/07, Paul Williams <pwilliams@diamonddata.com> wrote:
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Paul Williams wrote:
Ok, here is what we have. I did a netstat on both machines, client and server. The client sees and established connection and the server does not. In the server log there is a disconnect. As far as hardware between them, there is a switch (dell powerconnect 6024). Web Server Directors might get hold of it but there are no hops on traceroute. Traceroute only shows the client machine and the server machine.
So the client is just continuously polling the connection but getting nothing back.
That sounds like some weird kernel / networking problem to me: I don't see how Zope could be able to keep calling 'select' on a socket after the other side has closed it.
We agree. This is a strange situation that none of us have seen before.
However, we have until tomorrow to do something and replacing hardware is not feasable.
Is there any possibility that some kind of failover / IP takeover has happened, such that the storage server now running is not the same host / instance as the one to shich the clients originally connected? Are you using LVS + heartbeat, or some kind of hardware load balancer to manage such redundancy?
We do have Web Services Directors that do load balancing, but in this particular case, the storage server is not setup for load balancing, I am not aware of any features that make the zodb capable of clustering except for replication services offered through zope.
We are not sure whether the traffic is going to the Web Services Directores or not. Even if it is, there are thousands of settings and there is no-one available that knows what to change.
The storage server is a simple nas server with a static ip address.
What we are thinking about doing is changing the code in zrpc/connection.py to close the connection in wait (line 638 zope version 2.9.5) if the wait time gets too large or the poll has happened too many times.
We are great at plone development, but have very little backend zope development. Would someone please advise me as to whether this is going to cause more problems?
According to the log message you posted earlier in the thread, your appservers are spewing thousands of log messages from the connection's 'pending' method, although your deadlock debugger output shows the one thread blocked on 'select' inside of the connection's 'wait' method. There should be lots of log messages at TRACE level for the wait call, including a doubling / backoff of the delay value from 1 mx to 1 sec. Do you see those log messages, as well?
These messages are there. You can see the time doubling. This is where we were thinking of breaking the connection once it gets to a certain point and make zope reconnect.
This solves our hung connection problem, we think. However, I am hoping someone can let me know if I am breaking something else by doing this.
I don't remember if you already mentioned it. However: did you tried to monitor the traffic outgoing and incoming? I mean, setting some iptables rules and/or using something like tcpdump to monitor what is going on here? Regards Marco -- Marco Bizzarri http://iliveinpisa.blogspot.com/
No, we haven't done that yet. That is something else we may try. Marco Bizzarri wrote:
On 2/27/07, Paul Williams <pwilliams@diamonddata.com> wrote:
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Paul Williams wrote:
Ok, here is what we have. I did a netstat on both machines, client and server. The client sees and established connection and the server does not. In the server log there is a disconnect. As far as hardware between them, there is a switch (dell powerconnect 6024). Web Server Directors might get hold of it but there are no hops on traceroute. Traceroute only shows the client machine and the server machine.
So the client is just continuously polling the connection but getting nothing back.
That sounds like some weird kernel / networking problem to me: I don't see how Zope could be able to keep calling 'select' on a socket after the other side has closed it.
We agree. This is a strange situation that none of us have seen before.
However, we have until tomorrow to do something and replacing hardware is not feasable.
Is there any possibility that some kind of failover / IP takeover has happened, such that the storage server now running is not the same host / instance as the one to shich the clients originally connected? Are you using LVS + heartbeat, or some kind of hardware load balancer to manage such redundancy?
We do have Web Services Directors that do load balancing, but in this particular case, the storage server is not setup for load balancing, I am not aware of any features that make the zodb capable of clustering except for replication services offered through zope.
We are not sure whether the traffic is going to the Web Services Directores or not. Even if it is, there are thousands of settings and there is no-one available that knows what to change.
The storage server is a simple nas server with a static ip address.
What we are thinking about doing is changing the code in zrpc/connection.py to close the connection in wait (line 638 zope version 2.9.5) if the wait time gets too large or the poll has
happened
too many times.
We are great at plone development, but have very little backend zope development. Would someone please advise me as to whether this is going to cause more problems?
According to the log message you posted earlier in the thread, your appservers are spewing thousands of log messages from the connection's 'pending' method, although your deadlock debugger output shows the one thread blocked on 'select' inside of the connection's 'wait' method. There should be lots of log messages at TRACE level for the wait call, including a doubling / backoff of the delay value from 1 mx to 1 sec. Do you see those log messages, as well?
These messages are there. You can see the time doubling. This is where we were thinking of breaking the connection once it gets to a certain point and make zope reconnect.
This solves our hung connection problem, we think. However, I am hoping someone can let me know if I am breaking something else by doing this.
I don't remember if you already mentioned it. However: did you tried to monitor the traffic outgoing and incoming? I mean, setting some iptables rules and/or using something like tcpdump to monitor what is going on here?
Regards Marco
participants (4)
-
Brian Sullivan -
Marco Bizzarri -
Paul Williams -
Tres Seaver