I'm getting many of these in my event.log 2005-03-15T10:17:56 INFO(0) ZEC:1-None-1 flipping cache files. new current = 1 The goes down, every few days with no errors, other than the one above. In one instance the site went down after an error in a page template. This code shows in the event log just prior to the site going down. <p tal:repeat="photo container/objectValues"> <img tal:replace="structure photo"> <span tal:replace="photo/title">title</span> <span tal:replace="photo/getDescription">Description</span> </p> It seems improbable that the site would go down because of an error in a page template. Zope 2.7.3-0 CMFPlone 1.1 Python 2.3.4 Red Hat -- Darian Schramm darian at abstractedge dot com
Darian V Schramm wrote:
I'm getting many of these in my event.log
2005-03-15T10:17:56 INFO(0) ZEC:1-None-1 flipping cache files. new current = 1
This is nothing to worry abotu from a crashing point of view. Worry abotu it when you want to improve performance once you have a happilly running server.
The goes down,
What goes down?
every few days with no errors, other than the one above. In one instance the site went down after an error in a page template.
This code shows in the event log just prior to the site going down.
<p tal:repeat="photo container/objectValues"> <img tal:replace="structure photo"> <span tal:replace="photo/title">title</span> <span tal:replace="photo/getDescription">Description</span> </p>
That doesn't tally with your "no errors" statement above. What is the full entry from the event log?
It seems improbable that the site would go down because of an error in a page template.
Looks like you're playing with photos, if that includes external apps, and your Zope is running as a daemon process, then its certainly possible. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
The zope instance stops responding to incoming requests. I have a zeo client running that stays up with no errors in the zeo.log. The full entry from the event log is: 2005-03-15T15:22:55 INFO(0) ZEC:1-None-1 flipping cache files. new current = 1 ------ 2005-03-15T15:23:06 INFO(0) Plone Debug contentbar --- submit ------ 2005-03-15T15:23:06 INFO(0) Plone Debug <html> <p tal:repeat="photo container/objectValues"> <img tal:replace="structure photo"> <span tal:replace="photo/title">title</span> <span tal:replace="photo/getDescription">Description</span> </p> </html> ------ 2005-03-15T15:23:06 INFO(0) Plone Debug <FSControllerPythonScript at content_status_modify> ------ 2005-03-15T15:23:06 INFO(0) Plone Debug content_status ------ 2005-03-15T15:23:15 INFO(0) Plone Debug contentbar --- publish ------ 2005-03-15T15:23:15 INFO(0) Plone Debug <html> <p tal:repeat="photo container/objectValues"> <img tal:replace="structure photo"> <span tal:replace="photo/title">title</span> <span tal:replace="photo/getDescription">Description</span> </p> </html> ------ 2005-03-15T15:23:15 INFO(0) Plone Debug <FSControllerPythonScript at content_status_modify> ------ 2005-03-15T15:23:15 INFO(0) Plone Debug content_status At this point the zope instance doesn't respond, and causes 502 errors from the apache proxy. The zope process is still running, but it doesn't respond to any requests and times out. The only image manipulation that happens is within these page templates. This is a index_html in a folder containing images. This code throws an Attribute error because of the call to getDescription, but that is anticipated. Darian Schramm darian@abstractedge.com t: 212.352.9311 x 102 f: 212.352.9498 Chris Withers wrote:
Darian V Schramm wrote:
I'm getting many of these in my event.log
2005-03-15T10:17:56 INFO(0) ZEC:1-None-1 flipping cache files. new current = 1
This is nothing to worry abotu from a crashing point of view. Worry abotu it when you want to improve performance once you have a happilly running server.
The goes down,
What goes down?
every few days with no errors, other than the one above. In one instance the site went down after an error in a page template.
This code shows in the event log just prior to the site going down.
<p tal:repeat="photo container/objectValues"> <img tal:replace="structure photo"> <span tal:replace="photo/title">title</span> <span tal:replace="photo/getDescription">Description</span> </p>
That doesn't tally with your "no errors" statement above. What is the full entry from the event log?
It seems improbable that the site would go down because of an error in a page template.
Looks like you're playing with photos, if that includes external apps, and your Zope is running as a daemon process, then its certainly possible.
cheers,
Chris
Darian V Schramm <darian@abstractedge.com> wrote:
The zope instance stops responding to incoming requests. I have a zeo client running that stays up with no errors in the zeo.log.
You can use DeadlockDebugger to debug a stuck Zope. Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Some mornings, and once in a while, our site gets backend errors from squid, meaning our zeo clients are down. But we have them restart every hour on the half hour. Sometimes the parent process has died out but the children processes are still there. So even if the client restarts it cannot complete because there are children still there. We use Webmin and can type in the path to the files that launched the processes and then kill off the detached children. Then start the zeo client again and all is fine. First...How can the parent die off and the children remain? Second...how can I make sure (using a command or shell script) that the children are cleaned up if the parent process dies or we stop the service by hand or through use of the zopectl stop command? Similar to the way webmin does it with its Process Search tool by finding all the parts and 'killing' them off. Sort of a general RedHat Linux question with Zope wrappings. Thanks Allen
We use Webmin and can type in the path to the files that launched the processes and then kill off the detached children. Then start the zeo client again and all is fine.
First...How can the parent die off and the children remain?
Parents in unix are responsible (this is actually the technical term for it, I'm not sick) for "reaping their dead children". This doesn't happen automatically, someone needs to ensure they code it in. If the parent process dies before calling "waitpid" on its children (or if it never calls waitpid on its children), the children live on (sometimes they can become "zombie" processes).
Second...how can I make sure (using a command or shell script) that the children are cleaned up if the parent process dies or we stop the service by hand or through use of the zopectl stop command? Similar to the way webmin does it with its Process Search tool by finding all the parts and 'killing' them off.
Sounds like zopectl needs to be fixed in some way to reap its children if it dies. FWIW, I use "supervisor" to do this instead of zopectl (see http://www.plope.com/software/supervisor/) . It seems to work most of the time, although I have seen cases where the supervisord parent dies and the children live on, so there's still a bug lurking there somewher. - C
On Wed, Mar 16, 2005 at 01:43:48PM -0500, Chris McDonough wrote:
We use Webmin and can type in the path to the files that launched the processes and then kill off the detached children. Then start the zeo client again and all is fine.
the time, although I have seen cases where the supervisord parent dies and the children live on, so there's still a bug lurking there somewher.
I use daemontools rather than supervisor, but I've seen the same thing - sometimes not everything dies and zope can't restart because the ports are still bound. for the original poster: I have to wonder why you need to restart zope every hour. Really bad memory leak?? -- Paul Winkler http://www.slinkp.com
Allen Schmidt wrote at 2005-3-16 13:08 -0500:
Some mornings, and once in a while, our site gets backend errors from squid, meaning our zeo clients are down. But we have them restart every hour on the half hour. Sometimes the parent process has died out but the children processes are still there. So even if the client restarts it cannot complete because there are children still there.
A bug in Python, only fixed in Python 2.4... Search Python's bug tracker at Sourceforge for "signal handling, linuxthread". You should find the bug report and a patch -- in case, you want to fix your Python 2.3.x yourself. -- Dieter
Oh great, thanks for the link! This looks really useful. Darian Schramm darian@abstractedge.com t: 212.352.9311 x 102 f: 212.352.9498 Florent Guillaume wrote:
Darian V Schramm <darian@abstractedge.com> wrote:
The zope instance stops responding to incoming requests. I have a zeo client running that stays up with no errors in the zeo.log.
You can use DeadlockDebugger to debug a stuck Zope.
Florent
Darian V Schramm wrote:
The zope instance stops responding to incoming requests. I have a zeo client running that stays up with no errors in the zeo.log.
You're using Plone, so all bets are off ;-)
2005-03-15T15:23:15 INFO(0) Plone Debug <FSControllerPythonScript at content_status_modify> ------ 2005-03-15T15:23:15 INFO(0) Plone Debug content_status
At this point the zope instance doesn't respond, and causes 502 errors from the apache proxy. The zope process is still running, but it doesn't respond to any requests and times out.
I'd definitely suggest DeadlockDebugger...
The only image manipulation that happens is within these page templates.
Can you show us the full code? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Here's the full code from the page template. Again it throws an Attribute error for the getDescription call. It calls no external python scripts, it just iterates over the images in the current directory (container). I have DeadLockDebugger installed, but I can't reproduce the error that causes the site to stop responing, so I'm just waiting for the issue to come up again. In the zope.conf can you set the verbosity of the debugging level or is it just on/off? Thanks again. ---- <html> <table> <tr> <td>Item #</td> <td>ID</td> <td>Photo</td> <td>Title</td> <td>Size (bytes)</td> <td>Type</td> <td>Description</td> </tr> <tr tal:repeat="photo container/objectValues"> <td tal:content="repeat/photo/number">#</td> <td tal:content="photo/getId"></td> <td> <a class="photo" href="null" tal:attributes="href photo/getId"> <img tal:attributes="src photo/absolute_url"> </a> </td> <td tal:content="photo/title">Title</td> <td tal:content="photo/meta_type">Meta type</td> <td tal:content="photo/getDescription">Desc</td> </tr> </table> </html> --- Darian Schramm darian@abstractedge.com t: 212.352.9311 x 102 f: 212.352.9498 Chris Withers wrote:
Darian V Schramm wrote:
The zope instance stops responding to incoming requests. I have a zeo client running that stays up with no errors in the zeo.log.
You're using Plone, so all bets are off ;-)
2005-03-15T15:23:15 INFO(0) Plone Debug <FSControllerPythonScript at content_status_modify> ------ 2005-03-15T15:23:15 INFO(0) Plone Debug content_status
At this point the zope instance doesn't respond, and causes 502 errors from the apache proxy. The zope process is still running, but it doesn't respond to any requests and times out.
I'd definitely suggest DeadlockDebugger...
The only image manipulation that happens is within these page templates.
Can you show us the full code?
cheers,
Chris
Darian V Schramm wrote:
Here's the full code from the page template. Again it throws an Attribute error for the getDescription call.
You still haven't shown us the full traceback for this attribute error, you'll find it in the error_log object.
In the zope.conf can you set the verbosity of the debugging level or is it just on/off?
What bit of zope.conf are you refering to? debug-mode is either on or off, but you can set your logger's level to blather or debug if you really want ot see lots of messages ;-) cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
I've since restarted zeo and there are no errors, and the site has been as stable as ever. When the site initially went down the zeo and zope instance were restarted by the root user (i just recently found out) and the effective-user wasn't set in zope.conf. I'm starting to think that this may have had something to do with our continuing problems. I've seen where this could cause permission problems with the cache, am I correct in hypothesizing this? The error_log object in the zope root has no entries, and the error_log object in the plone root has since been overwritten with new exceptions. Both weren't set up to copy exceptions to the event log, I've turned this on to log these errors. Thanks everyone for your suggestions. Darian Schramm Chris Withers wrote:
Darian V Schramm wrote:
Here's the full code from the page template. Again it throws an Attribute error for the getDescription call.
You still haven't shown us the full traceback for this attribute error, you'll find it in the error_log object.
In the zope.conf can you set the verbosity of the debugging level or is it just on/off?
What bit of zope.conf are you refering to?
debug-mode is either on or off, but you can set your logger's level to blather or debug if you really want ot see lots of messages ;-)
cheers,
Chris
participants (7)
-
Allen Schmidt -
Chris McDonough -
Chris Withers -
Darian V Schramm -
Dieter Maurer -
Florent Guillaume -
Paul Winkler