Thank you, Dieter, for your valuable insights and information. I am forwarding this to my ISP. --- Dieter Maurer <dieter@handshake.de> wrote:
Ken Ara wrote at 2006-9-5 07:47 -0700:
... Of immediate concern to me is whether I can do anything to prevent this happening again. From time to time, my Zope hangs, usually because of an attack by a bad robot requesting lots of complex pages and sending no-cache headers. Then I am able to restart Zope and all is well. For a while, when these attacks were frequent, I had a crontab to zopectl restart every hour.
There are solutions (I think "daemontools", but may be wrong) that can automate this more intelligently than a cronjob.
We have our own check server which polls Zope and if it does not respond in time restarts it.
But this event was different and I would like to know if anyone thinks that something I am doing wrong could cause the Zope process to become 'unkillable' and require a reset of the machine. Has anyone else had this problem?
Up to Python 2.3.4 and Python 2.4.0 (fixed in Python 2.3.5 and Python 2.4.1), a fatal signal (like "SIGSEGV") could bring Zope in a state where its main thread was killed but the child threads were still alive. These child threads could only be killed with "kill -9".
Although we now use Python 2.4.1, I have seen a similar problem just a few days ago. But almost surely, this has to do with the Java Virtual Machine which we now also integrate in our Zope instances.
However, when even "kill -9" (as "root") is no longer able to kill a process, then the process is somewhere deep in the operating system (where signal handling is deactivated for consistency reasons). Usually, this indicates a network problem.
And if your operating system is no longer ready to shutdown, then you have an even more fundamental problem -- maybe, too, connected to network problems.
I fear we cannot help you much -- as a intensive analysis of your system would be necessary in order to find the causes of your problems.
I would have liked to perform some diagnostic on the machine in its stuck state, but neither I nor the ISP knew where to start.
Usually, one would start with an analysis of the operating system log files.
If they do not tell anything, then one would check what is still working (e.g. is the console still responding, does it still observe the magic "CTRL-ALT-DEL" reboot key sequence), which commands fail and in what way, ...
-- Dieter
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com