Hey All, I'm experiencing hanging issues with my Zope-2.8.6+zeo setup/ RHEL 4. The hanging isn't categorized by 100% cpu usage. Actually, I had the same issues using 2.8.5, but I've upgraded since then. Here's the situation: I have one zeo client connected to a zeo server on the same box. Apache sits in front, using RewriteRules to request data from zope. After some time (could be 2 minutes or an hour), the zeo client stops responding. Apparently this is called a deadlock or a "spinning zope". I've tried using gdb to attach to the zeo client pid, and use the recipe http://zopelabs.com/cookbook/1073504990 to print a traceback, but the call always aborted with SIGABORT. I've captured all of the requests sent to zope during an uptime window (via Z2.log), and using wget to "replay" the requests. I've also pulled from apache's rewrite log all requests proxied to zope, thinking the Z2.log only writes finished requests. I setup another zeo client (on the same box, different port) and used wget to replay these captures as well. Just running these captures does not cause zope to hang. In fact, I have not been able to cause zope to hang by replaying. There doesn't seem to be any one url or sequence of urls that cause zope to hang. I've tried reinstalling the zope instance, but that didn't help. I've tried using requestprofiler.py to inspect the trace.log. This shows a high number of "hangs", but not on a url that actually triggers a spinning zope. Basically, that's where I'm stuck. Is there anything else I can try? Am I missing something? Thanks for the help, Andy
Try DeadlockDebugger. Florent Andy Altepeter wrote:
Hey All,
I'm experiencing hanging issues with my Zope-2.8.6+zeo setup/ RHEL 4. The hanging isn't categorized by 100% cpu usage. Actually, I had the same issues using 2.8.5, but I've upgraded since then. Here's the situation:
I have one zeo client connected to a zeo server on the same box. Apache sits in front, using RewriteRules to request data from zope.
After some time (could be 2 minutes or an hour), the zeo client stops responding. Apparently this is called a deadlock or a "spinning zope".
I've tried using gdb to attach to the zeo client pid, and use the recipe http://zopelabs.com/cookbook/1073504990 to print a traceback, but the call always aborted with SIGABORT.
I've captured all of the requests sent to zope during an uptime window (via Z2.log), and using wget to "replay" the requests. I've also pulled from apache's rewrite log all requests proxied to zope, thinking the Z2.log only writes finished requests. I setup another zeo client (on the same box, different port) and used wget to replay these captures as well. Just running these captures does not cause zope to hang. In fact, I have not been able to cause zope to hang by replaying. There doesn't seem to be any one url or sequence of urls that cause zope to hang.
I've tried reinstalling the zope instance, but that didn't help.
I've tried using requestprofiler.py to inspect the trace.log. This shows a high number of "hangs", but not on a url that actually triggers a spinning zope.
Basically, that's where I'm stuck. Is there anything else I can try? Am I missing something?
Thanks for the help, Andy _______________________________________________ Zope maillist - Zope-CWUwpEBWKX0@public.gmane.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
-- Florent Guillaume, Nuxeo (Paris, France) Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
I'll chime in with a "me too" ( see me thread within the last week on the same list). I haven't looked into it as deeply as you, but I have tried the DeadlockDebugger which itself was inaccessible during the time when zope was spinning. Nothing in the logs. My install is Zope 2.8.5 on RHEL 4 without Zeo. Florent Guillaume wrote:
Try DeadlockDebugger.
Florent
Andy Altepeter wrote:
Hey All,
I'm experiencing hanging issues with my Zope-2.8.6+zeo setup/ RHEL 4. The hanging isn't categorized by 100% cpu usage. Actually, I had the same issues using 2.8.5, but I've upgraded since then. Here's the situation:
I have one zeo client connected to a zeo server on the same box. Apache sits in front, using RewriteRules to request data from zope.
After some time (could be 2 minutes or an hour), the zeo client stops responding. Apparently this is called a deadlock or a "spinning zope".
I've tried using gdb to attach to the zeo client pid, and use the recipe http://zopelabs.com/cookbook/1073504990 to print a traceback, but the call always aborted with SIGABORT. I've captured all of the requests sent to zope during an uptime window (via Z2.log), and using wget to "replay" the requests. I've also pulled from apache's rewrite log all requests proxied to zope, thinking the Z2.log only writes finished requests. I setup another zeo client (on the same box, different port) and used wget to replay these captures as well. Just running these captures does not cause zope to hang. In fact, I have not been able to cause zope to hang by replaying. There doesn't seem to be any one url or sequence of urls that cause zope to hang.
I've tried reinstalling the zope instance, but that didn't help.
I've tried using requestprofiler.py to inspect the trace.log. This shows a high number of "hangs", but not on a url that actually triggers a spinning zope.
Basically, that's where I'm stuck. Is there anything else I can try? Am I missing something?
Thanks for the help, Andy _______________________________________________ Zope maillist - Zope-CWUwpEBWKX0@public.gmane.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Andy Altepeter wrote at 2006-4-24 14:26 -0500:
... I'm experiencing hanging issues with my Zope-2.8.6+zeo setup/ RHEL 4. The hanging isn't categorized by 100% cpu usage. Actually, I had the same issues using 2.8.5, but I've upgraded since then. Here's the situation:
I have one zeo client connected to a zeo server on the same box. Apache sits in front, using RewriteRules to request data from zope.
After some time (could be 2 minutes or an hour), the zeo client stops responding. Apparently this is called a deadlock or a "spinning zope".
I know this behaviour from a Python bug triggered by Linux threading peculiarity: In the case of this bug, the main thread is killed by a deadly signal but all other threads remain alife. Therefore, neither the zdaemon nor the clients recognized Zope's death (the "zdaemon" may recognize it but it cannot restart Zope as the sockets are still in use). A recognizing feature of this bug is that the remaining threads need to be killed with "kill -9". This bug is fixed in newest Python versions of the "Python 2.3" and "Python 2.4" series. -- Dieter
participants (4)
-
Andy Altepeter -
Dieter Maurer -
Erik Myllymaki -
Florent Guillaume