Can't stop Zope, machine hanging
Hi all, My Zope is stuck like never before. With my server on another continent, 8 hours time difference, and on a Sunday, I am pulling out my few remaining hairs... Our box runs FreeBSD 5, Zope 2.78 (if I remember correctly) and Squid. The ISP has been preparing for a move and shifted some machines, pulled some wires, and I see that the machine has rebooted. I can't restart or stop Zope - '/usr/local/www/Zope/zope01/bin/zopectl stop' just produces '........' for a long time - should I wait? I was unable to kill one of the python2.3 processes, and can't even reboot the machine, using 'shutdown -r now', as I have done previously in extremis. Any ideas? Not strictly a Zope question, but I would appreciate any help. Thanks. Ken __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--On 2. September 2006 23:43:50 -0700 Ken Ara <feedreader@yahoo.com> wrote:
Hi all,
My Zope is stuck like never before. With my server on another continent, 8 hours time difference, and on a Sunday, I am pulling out my few remaining hairs...
Our box runs FreeBSD 5, Zope 2.78 (if I remember correctly) and Squid.
The ISP has been preparing for a move and shifted some machines, pulled some wires, and I see that the machine has rebooted.
I can't restart or stop Zope - '/usr/local/www/Zope/zope01/bin/zopectl stop' just produces '........' for a long time - should I wait?
Use "netstat -anp" to figure out if there is any process listening to *your* Zope port (port should be in state LISTEN). If yes, the "-p" option should give you the process id. Try to kill the process (if you have the permissions). If you don't have the permissions, ask your administrator. If there is no process running, try "zopectl fg"...this should give you more detailed error messages on the console. -aj
Thanks Andreas, I am logged in as root user. Here is the result of 'top': last pid: 77998; load averages: 0.01, 0.06, 0.06 up 0+15:03:27 04:06:22 18 processes: 1 running, 11 sleeping, 5 stopped, 1 zombie CPU states: 0.0% user, 0.0% nice, 0.8% system, 0.0% interrupt, 99.2% idle Mem: 186M Active, 104M Inact, 152M Wired, 36M Cache, 111M Buf, 518M Free Swap: 4096M Total, 329M Used, 3767M Free, 8% Inuse PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 945 ken 76 0 442M 131M STOP 112:18 0.00% 0.00% python2.3 937 root 96 0 7288K 2588K select 0:01 0.00% 0.00% perl 77681 ken 96 0 6092K 1844K select 0:01 0.00% 0.00% sshd 975 root 8 0 1356K 208K nanslp 0:00 0.00% 0.00% cron 893 root 96 0 1312K 392K select 0:00 0.00% 0.00% syslogd 77693 root 20 0 2296K 1560K pause 0:00 0.00% 0.00% csh 77661 root 4 0 6112K 1780K sbwait 0:00 0.00% 0.00% sshd 77685 ken 20 0 2252K 1480K pause 0:00 0.00% 0.00% csh 994 mailnull 96 0 4224K 140K select 0:00 0.00% 0.00% exim-4.44-0 968 root 96 0 3360K 328K select 0:00 0.00% 0.00% sshd 77690 ken 8 0 1600K 1124K wait 0:00 0.00% 0.00% su 77991 root 96 0 2200K 1240K RUN 0:00 0.00% 0.00% top I was able to kill one of the Zope processes but not pid 945. Maybe it wasn't a good idea, but even 'kill -s HUP 945' had no effect. Zope is not running. Here is what I get using 'zopectl fg' : geneva# /usr/local/www/Zope/zope01/bin/zopectl fg export EVENT_LOG_FILE EVENT_LOG_FILE= /usr/local/www/Zope/zope01/bin/runzope Traceback (most recent call last): File "/usr/local/www/Zope/lib/python/Zope/Startup/run.py", line 50, in ? run() File "/usr/local/www/Zope/lib/python/Zope/Startup/run.py", line 19, in run start_zope(opts.configroot) File "/usr/local/www/Zope/lib/python/Zope/Startup/__init__.py", line 46, in start_zope starter.setupServers() File "/usr/local/www/Zope/lib/python/Zope/Startup/__init__.py", line 198, in setupServers raise ZConfig.ConfigurationError(socket_err ZConfig.ConfigurationError: There was a problem starting a server of type "HTTPServer". This may mean that your user does not have permission to bind to the port which the server is trying to use or the port may already be in use by another application. (Address already in use) Same exact error trying to restart Zope with bin/runzope As I mentioned, I can't even reboot the machine, the shutdown command can't kill this process either. Ken --- Andreas Jung <lists@zopyx.com> wrote:
--On 2. September 2006 23:43:50 -0700 Ken Ara <feedreader@yahoo.com> wrote:
Hi all,
My Zope is stuck like never before. With my server on another continent, 8 hours time difference, and on a Sunday, I am pulling out my few remaining hairs...
Our box runs FreeBSD 5, Zope 2.78 (if I remember correctly) and Squid.
The ISP has been preparing for a move and shifted some machines, pulled some wires, and I see that the machine has rebooted.
I can't restart or stop Zope - '/usr/local/www/Zope/zope01/bin/zopectl stop' just produces '........' for a long time - should I wait?
Use "netstat -anp" to figure out if there is any process listening to *your* Zope port (port should be in state LISTEN). If yes, the "-p" option should give you the process id. Try to kill the process (if you have the permissions). If you don't have the permissions, ask your administrator. If there is no process running, try "zopectl fg"...this should give you more detailed error messages on the console.
-aj
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--On 3. September 2006 01:18:55 -0700 Ken Ara <feedreader@yahoo.com> wrote:
Thanks Andreas,
I am logged in as root user. Here is the result of 'top':
"top" is *not* the right tool to check for processes. Use "ps"!!!
I was able to kill one of the Zope processes but not pid 945. Maybe it wasn't a good idea, but even 'kill -s HUP 945' had no effect.
You still have the power of kill -TERM and kill -KILL.
ZConfig.ConfigurationError: There was a problem starting a server of type "HTTPServer". This may mean that your user does not have permission to bind to the port which the server is trying to use or the port may already be in use by another application. (Address already in use)
Same exact error trying to restart Zope with bin/runzope
As I mentioned, I can't even reboot the machine, the shutdown command can't kill this process either.
I answered to use netstat for finding out the related process. Please follow that advice *first* and then ask again. -aj
Actually, I need to read up on the netstat command. Have a look: geneva# netstat -anp netstat: option requires an argument -- p usage: netstat [-AaLnSW] [-f protocol_family | -p protocol] [-M core] [-N system] netstat -i | -I interface [-abdnt] [-f address_family] [-M core] [-N system] netstat -w wait [-I interface] [-d] [-M core] [-N system] netstat -s [-s] [-z] [-f protocol_family | -p protocol] [-M core] netstat -i | -I interface -s [-f protocol_family | -p protocol] [-M core] [-N system] netstat -m [-c] [-M core] [-N system] netstat -r [-AenW] [-f address_family] [-M core] [-N system] netstat -rs [-s] [-M core] [-N system] netstat -g [-W] [-f address_family] [-M core] [-N system] netstat -gs [-s] [-f address_family] [-M core] [-N system] And, I guess it time for me to learn to kill... --- Andreas Jung <lists@zopyx.com> wrote:
--On 3. September 2006 01:18:55 -0700 Ken Ara <feedreader@yahoo.com> wrote:
Thanks Andreas,
I am logged in as root user. Here is the result of 'top':
"top" is *not* the right tool to check for processes. Use "ps"!!!
I was able to kill one of the Zope processes but
not
pid 945. Maybe it wasn't a good idea, but even 'kill -s HUP 945' had no effect.
You still have the power of kill -TERM and kill -KILL.
ZConfig.ConfigurationError: There was a problem starting a server of type "HTTPServer". This may mean that your user does not have permission to bind to the port which the server is trying to use or the port may already be in use by another application. (Address already in use)
Same exact error trying to restart Zope with bin/runzope
As I mentioned, I can't even reboot the machine, the shutdown command can't kill this process either.
I answered to use netstat for finding out the related process. Please follow that advice *first* and then ask again.
-aj
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--On 3. September 2006 01:56:44 -0700 Ken Ara <feedreader@yahoo.com> wrote:
Actually, I need to read up on the netstat command. Have a look:
geneva# netstat -anp netstat: option requires an argument -- p
Please read the man page for netstat (man netstat). Nobody has all options for all programs on every possible operating system in the head). -aj
Ken Ara wrote at 2006-9-2 23:43 -0700:
... I can't restart or stop Zope - '/usr/local/www/Zope/zope01/bin/zopectl stop' just produces '........' for a long time - should I wait? I was unable to kill one of the python2.3 processes, and can't even reboot the machine, using 'shutdown -r now', as I have done previously in extremis.
This indicates a severe problem with your operating system (or maybe your computer). There are situations where a press on the reset button is necessary. Looks like you are in such a situation... -- Dieter
You are right, Dieter. The ISP network admin had to restart the machine. There seems to be no way to power down and then power up the machine remotely. Fortunately everything worked out. Thanks. --- Dieter Maurer <dieter@handshake.de> wrote:
Ken Ara wrote at 2006-9-2 23:43 -0700:
... I can't restart or stop Zope - '/usr/local/www/Zope/zope01/bin/zopectl stop' just produces '........' for a long time - should I wait? I was unable to kill one of the python2.3 processes, and can't even reboot the machine, using 'shutdown -r now', as I have done previously in extremis.
This indicates a severe problem with your operating system (or maybe your computer).
There are situations where a press on the reset button is necessary. Looks like you are in such a situation...
-- Dieter
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
One and a half days later, I am again unable to stop, restart or kill Zope. My ISP host is making changes on their network and I believe the problem is with them. I am in a FreeBSD jail, sharing the machine with a DNS server. It's 3am over there, and I wish I could do something while waiting for them to wake up. Then, we need to find out what is causing this problem. Andreas, I figured out that the netstat -p option required a protocol argument. So, 'netstat -anp TCP' gives me a very long list of IPs trying to connect. It does not display the pid of the hanging process, maybe some other option would do, but the problem seems to be a python2.3 process stuck in STOP state. I can't kill it, and I can't even restart the machine with the shutdown command. Is there no alternative to physically restarting this machine? While we are stuck, is there anything I can do to diagnose the problem? Thanks, Ken --- Dieter Maurer <dieter@handshake.de> wrote:
Ken Ara wrote at 2006-9-2 23:43 -0700:
... I can't restart or stop Zope - '/usr/local/www/Zope/zope01/bin/zopectl stop' just produces '........' for a long time - should I wait? I was unable to kill one of the python2.3 processes, and can't even reboot the machine, using 'shutdown -r now', as I have done previously in extremis.
This indicates a severe problem with your operating system (or maybe your computer).
There are situations where a press on the reset button is necessary. Looks like you are in such a situation...
-- Dieter
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Ken Ara wrote:
One and a half days later, I am again unable to stop, restart or kill Zope. My ISP host is making changes on their network and I believe the problem is with them. I am in a FreeBSD jail, sharing the machine with a DNS server. It's 3am over there, and I wish I could do something while waiting for them to wake up. Then, we need to find out what is causing this problem.
Move hosters. FreeBSD for all its robustness is not something I'd recommend running Zope on...
Is there no alternative to physically restarting this machine? While we are stuck, is there anything I can do to diagnose the problem?
You may have more luck on a FreeBSD list... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
On Tuesday 05 September 2006 3:58 am, Chris Withers wrote:
Move hosters. FreeBSD for all its robustness is not something I'd recommend running Zope on...
Seriously? Why so? I've been running Zope exclusively on FreeBSD since 2001 without any known problems. -- Kirk Strauser The Day Companies
OK, once again, after many hours of costly down time, we have restarted our Zope machine. Chris, could state your reasons for recommending against FreeBSD? I have had no trouble in over five years that I can directly attribute to FreeBSD, but I am open-minded. That said, I don't mean to reopen the 'best os' debate. This time I am pretty convinced that our problem had to do with changes due to the physical move of some equipment and reconfiguration at the ISP. Of immediate concern to me is whether I can do anything to prevent this happening again. From time to time, my Zope hangs, usually because of an attack by a bad robot requesting lots of complex pages and sending no-cache headers. Then I am able to restart Zope and all is well. For a while, when these attacks were frequent, I had a crontab to zopectl restart every hour. But this event was different and I would like to know if anyone thinks that something I am doing wrong could cause the Zope process to become 'unkillable' and require a reset of the machine. Has anyone else had this problem? I would have liked to perform some diagnostic on the machine in its stuck state, but neither I nor the ISP knew where to start. I can accept that, as Dieter said, there are times when the only choice is to switch off and on the box - which I can't do remotely - but wonder if I could have done more... --- Chris Withers <chris@simplistix.co.uk> wrote:
Ken Ara wrote:
One and a half days later, I am again unable to stop, restart or kill Zope. My ISP host is making changes on their network and I believe the problem is with them. I am in a FreeBSD jail, sharing the machine with a DNS server. It's 3am over there, and I wish I could do something while waiting for them to wake up. Then, we need to find out what is causing this problem.
Move hosters. FreeBSD for all its robustness is not something I'd recommend running Zope on...
Is there no alternative to physically restarting this machine? While we are stuck, is there anything I can do to diagnose the problem?
You may have more luck on a FreeBSD list...
Chris
-- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--On 5. September 2006 07:47:05 -0700 Ken Ara <feedreader@yahoo.com> wrote:
OK, once again, after many hours of costly down time, we have restarted our Zope machine.
Chris, could state your reasons for recommending against FreeBSD? I have had no trouble in over five years that I can directly attribute to FreeBSD, but I am open-minded. That said, I don't mean to reopen the 'best os' debate.
This remark was just stupid. There is nothing with BSD except there are some known issues you need to consider when compiling Python (at least there were some issues in the past). -aj
Andreas Jung wrote:
This remark was just stupid. There is nothing with BSD except there are some known issues you need to consider when compiling Python (at least there were some issues in the past).
Sorry, I disagree. Look back over the list archives, it seems common for people to have problems with Zope on BSD which aren't experienced by those using linux. BSD is in the minority, and so if you do have problems, there may well be no-one around who has enough experience to recommend a solution. As such, I'd say BSD is a bad choice for Zope. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers wrote at 2006-9-6 10:59 +0100:
Andreas Jung wrote:
This remark was just stupid. There is nothing with BSD except there are some known issues you need to consider when compiling Python (at least there were some issues in the past).
Sorry, I disagree.
Look back over the list archives, it seems common for people to have problems with Zope on BSD which aren't experienced by those using linux.
I agree with Andreas. There was a single problem which affected many BSD Zope users: The stack area for threads was too small by default and let Zope experience a C stack overflow. Quickly, a port was available that fixed this problem -- but unaware users still used the broken Python. The problem was however fixed a long time ago by Python explicitely requesting a large enough thread stack... -- Dieter
Ken Ara wrote at 2006-9-5 07:47 -0700:
... Of immediate concern to me is whether I can do anything to prevent this happening again. From time to time, my Zope hangs, usually because of an attack by a bad robot requesting lots of complex pages and sending no-cache headers. Then I am able to restart Zope and all is well. For a while, when these attacks were frequent, I had a crontab to zopectl restart every hour.
There are solutions (I think "daemontools", but may be wrong) that can automate this more intelligently than a cronjob. We have our own check server which polls Zope and if it does not respond in time restarts it.
But this event was different and I would like to know if anyone thinks that something I am doing wrong could cause the Zope process to become 'unkillable' and require a reset of the machine. Has anyone else had this problem?
Up to Python 2.3.4 and Python 2.4.0 (fixed in Python 2.3.5 and Python 2.4.1), a fatal signal (like "SIGSEGV") could bring Zope in a state where its main thread was killed but the child threads were still alive. These child threads could only be killed with "kill -9". Although we now use Python 2.4.1, I have seen a similar problem just a few days ago. But almost surely, this has to do with the Java Virtual Machine which we now also integrate in our Zope instances. However, when even "kill -9" (as "root") is no longer able to kill a process, then the process is somewhere deep in the operating system (where signal handling is deactivated for consistency reasons). Usually, this indicates a network problem. And if your operating system is no longer ready to shutdown, then you have an even more fundamental problem -- maybe, too, connected to network problems. I fear we cannot help you much -- as a intensive analysis of your system would be necessary in order to find the causes of your problems.
I would have liked to perform some diagnostic on the machine in its stuck state, but neither I nor the ISP knew where to start.
Usually, one would start with an analysis of the operating system log files. If they do not tell anything, then one would check what is still working (e.g. is the console still responding, does it still observe the magic "CTRL-ALT-DEL" reboot key sequence), which commands fail and in what way, ... -- Dieter
Thank you, Dieter, for your valuable insights and information. I am forwarding this to my ISP. --- Dieter Maurer <dieter@handshake.de> wrote:
Ken Ara wrote at 2006-9-5 07:47 -0700:
... Of immediate concern to me is whether I can do anything to prevent this happening again. From time to time, my Zope hangs, usually because of an attack by a bad robot requesting lots of complex pages and sending no-cache headers. Then I am able to restart Zope and all is well. For a while, when these attacks were frequent, I had a crontab to zopectl restart every hour.
There are solutions (I think "daemontools", but may be wrong) that can automate this more intelligently than a cronjob.
We have our own check server which polls Zope and if it does not respond in time restarts it.
But this event was different and I would like to know if anyone thinks that something I am doing wrong could cause the Zope process to become 'unkillable' and require a reset of the machine. Has anyone else had this problem?
Up to Python 2.3.4 and Python 2.4.0 (fixed in Python 2.3.5 and Python 2.4.1), a fatal signal (like "SIGSEGV") could bring Zope in a state where its main thread was killed but the child threads were still alive. These child threads could only be killed with "kill -9".
Although we now use Python 2.4.1, I have seen a similar problem just a few days ago. But almost surely, this has to do with the Java Virtual Machine which we now also integrate in our Zope instances.
However, when even "kill -9" (as "root") is no longer able to kill a process, then the process is somewhere deep in the operating system (where signal handling is deactivated for consistency reasons). Usually, this indicates a network problem.
And if your operating system is no longer ready to shutdown, then you have an even more fundamental problem -- maybe, too, connected to network problems.
I fear we cannot help you much -- as a intensive analysis of your system would be necessary in order to find the causes of your problems.
I would have liked to perform some diagnostic on the machine in its stuck state, but neither I nor the ISP knew where to start.
Usually, one would start with an analysis of the operating system log files.
If they do not tell anything, then one would check what is still working (e.g. is the console still responding, does it still observe the magic "CTRL-ALT-DEL" reboot key sequence), which commands fail and in what way, ...
-- Dieter
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Ken Ara wrote at 2006-9-5 01:14 -0700:
... and I can't even restart the machine with the shutdown command.
If you cannot shutdown your system, then something seriously is wrong with it. It will have nothing to do with Zope. You may try the "reboot" command. It is similar to "shutdown" but start "reboot"ing immediately while "shutdown" (even with "-r") does something before it. You might even try "kill 1" or "kill -9 1" (this the initialization process; if killed, you *nix will go down). *BUT* I have not much hope for you... There are situations where only the "reset" button is able to bring the system again in a sane state. And your description seems to indicate such a situation... By the way, there are tools like "RemoteConsole" which allows you to "press" the "reset" button remotely. Our administrators use such tools. -- Dieter
participants (5)
-
Andreas Jung -
Chris Withers -
Dieter Maurer -
Ken Ara -
Kirk Strauser