I'm having some trouble with a Zope server becoming unresponsive. The server runs quite a busy site, and sometimes has memory problems. We have been restarting zeo, and then zope, every few hours (with some five minutes difference inbetween). Recently, the disk containing Data.fs became full, and we had some strange conflict errors, where the error message appeared to suggest that our transaction had caused a conflict because it had started many hours ago, which of course it hadn't. We truncated the Data.fs some and the problems went away. A few days later, we had a new problem. Right after flipping cache files, Zope began to log a flood of errors such as: (our cache file).zec invalidate: oid mismatch: expected 0x0edcd6 read '(data)' The data looked like object pickles, including strings from our application. We deleted all the cache files, and deleted the Data.fs.index, and restarted. We haven't seen any obvious data errors since. That may be related to the current issue, or it may be a red herring. But now, the server has gone into a catatonic state just after a restart, at least twice. In this state, nothing appears in the Z2 or event logs, and requests for pages appear to mostly time out (although apache, for whatever reason, served up empty responses with a 200 ok code for some Zope requests during this time). The server sits in this dead state until it is restarted. The last time it happened, it would not restart correctly until zeo had also been restarted. I'm baffled. Can anybody shed any light on what may be happening? Thanks, Malcolm -- [] j a m k i t web solutions for charities malcolm cleaton T: 020 7549 0520 F: 020 7490 1152 M: 07986 563852 W: www.jamkit.com
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2. This happens to me now and again. Basically, Zope just hangs and nothing from /bin/runzope to /zopectl will work to get it back up. I need to kill the threads (that are left, sometime it looks like one thread dies) and restart Zope. I would say over the last year, this has happened maybe 3 times. I do use Photo and Photo Album which I have found have had some conflicts in the past. Jake -- http://www.ZopeZone.com Malcolm Cleaton said:
I'm having some trouble with a Zope server becoming unresponsive.
The server runs quite a busy site, and sometimes has memory problems. We have been restarting zeo, and then zope, every few hours (with some five minutes difference inbetween).
Recently, the disk containing Data.fs became full, and we had some strange conflict errors, where the error message appeared to suggest that our transaction had caused a conflict because it had started many hours ago, which of course it hadn't.
We truncated the Data.fs some and the problems went away. A few days later, we had a new problem. Right after flipping cache files, Zope began to log a flood of errors such as:
(our cache file).zec invalidate: oid mismatch: expected 0x0edcd6 read '(data)'
The data looked like object pickles, including strings from our application.
We deleted all the cache files, and deleted the Data.fs.index, and restarted. We haven't seen any obvious data errors since.
That may be related to the current issue, or it may be a red herring. But now, the server has gone into a catatonic state just after a restart, at least twice. In this state, nothing appears in the Z2 or event logs, and requests for pages appear to mostly time out (although apache, for whatever reason, served up empty responses with a 200 ok code for some Zope requests during this time).
The server sits in this dead state until it is restarted. The last time it happened, it would not restart correctly until zeo had also been restarted.
I'm baffled. Can anybody shed any light on what may be happening?
Thanks, Malcolm
--
[] j a m k i t web solutions for charities
malcolm cleaton T: 020 7549 0520 F: 020 7490 1152 M: 07986 563852 W: www.jamkit.com
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
--On Freitag, 17. Dezember 2004 8:14 Uhr -0500 Jake <jake@zopezone.com> wrote:
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
which are somewhat *old* versions. -aj
Yes... but VERY stable. :) While running production, profitable sites, it makes it harder to jump on every new release. Jake -- http://www.ZopeZone.com Andreas Jung said:
--On Freitag, 17. Dezember 2004 8:14 Uhr -0500 Jake <jake@zopezone.com> wrote:
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
which are somewhat *old* versions.
-aj
Jake <jake@zopezone.com> wrote:
Andreas Jung said:
Jake <jake@zopezone.com> wrote:
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
which are somewhat *old* versions. Yes... but VERY stable. :)
Well, no. Anything before Zope 2.7.3 is broken in my book. Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Been using Zope for 5 years now (or longer) and 2.7b3 has been the most stable installation I have ever had, bar none. 2.7.3 could be more stable, but migrating servers is not getting any easier. It is the old double edged sword of using products for functionality, but having to troubleshoot them and update them as versions move on. Jake -- http://www.ZopeZone.com Florent Guillaume said:
Jake <jake@zopezone.com> wrote:
Andreas Jung said:
Jake <jake@zopezone.com> wrote:
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
which are somewhat *old* versions. Yes... but VERY stable. :)
Well, no. Anything before Zope 2.7.3 is broken in my book.
Florent
-- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Jake wrote at 2004-12-17 08:14 -0500:
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2.
This happens to me now and again. Basically, Zope just hangs and nothing from /bin/runzope to /zopectl will work to get it back up. I need to kill the threads (that are left, sometime it looks like one thread dies) and restart Zope. I would say over the last year, this has happened maybe 3 times.
This is a Python bug together with LinuxThreads triggered by a fatal signal. Your options: * The Python bug tracker contains a patch for Python 2.3.x (I do not know the number). * The bug is fixed in Python 2.4 * You can switch to Linux 2.6 and "PosixThreads" * You can try to find out why you get the fatal signal (and avoid it) You will want to do this anyway... -- Dieter
That is interesting. I am using Python 2.3.4 (#1, Jul 10 2004, 04:04:12). I will check out 2.4 and see if I can get 2.7.3/CMF1.4/Plone 2.0.4 up and running on it. Jake -- http://www.ZopeZone.com Dieter Maurer said:
Jake wrote at 2004-12-17 08:14 -0500:
First of all I am currently running 2.7b3, Plone 2.0R3 and CMF 1.4.2.
This happens to me now and again. Basically, Zope just hangs and nothing from /bin/runzope to /zopectl will work to get it back up. I need to kill the threads (that are left, sometime it looks like one thread dies) and restart Zope. I would say over the last year, this has happened maybe 3 times.
This is a Python bug together with LinuxThreads triggered by a fatal signal.
Your options:
* The Python bug tracker contains a patch for Python 2.3.x (I do not know the number).
* The bug is fixed in Python 2.4
* You can switch to Linux 2.6 and "PosixThreads"
* You can try to find out why you get the fatal signal (and avoid it) You will want to do this anyway...
-- Dieter
On Fri, Dec 17, 2004 at 11:30:35AM +0000, Malcolm Cleaton wrote:
The server sits in this dead state until it is restarted. The last time it happened, it would not restart correctly until zeo had also been restarted.
I'm baffled. Can anybody shed any light on what may be happening?
Have you tried the "debug spinning zope" gdb technique? http://www.zope.org/Members/4am/debugspinningzope I once found a hanging method this way. -- Paul Winkler http://www.slinkp.com
Malcolm Cleaton wrote at 2004-12-17 11:30 +0000:
... Recently, the disk containing Data.fs became full, and we had some strange conflict errors, where the error message appeared to suggest that our transaction had caused a conflict because it had started many hours ago, which of course it hadn't.
Usually, the dates are reliable -- unless your system has severe memory problems...
.... But now, the server has gone into a catatonic state just after a restart, at least twice. In this state, nothing appears in the Z2 or event logs, and requests for pages appear to mostly time out (although apache, for whatever reason, served up empty responses with a 200 ok code for some Zope requests during this time).
Startup problems (these are those where nothing appears in the event log file) are best analysed via: export EVENT_LOG_FILE=<logfile name> bin/runzope You should see all log messages in the file identified by "<logfile name>" -- even those that are usually suppressed (for an arcane feature).
The server sits in this dead state until it is restarted. The last time it happened, it would not restart correctly until zeo had also been restarted.
There is a HowTo about "Debugging a spinning Zope". It may help you to analyse the problem. However, from your description, I would not trust your server. Maybe, there is some hardware problem. We, too, run busy Zope sites and have not seen any of your problems. -- Dieter
On Fri, 17 Dec 2004 21:15:25 +0100, Dieter Maurer wrote:
Malcolm Cleaton wrote at 2004-12-17 11:30 +0000:
... Recently, the disk containing Data.fs became full, and we had some strange conflict errors, where the error message appeared to suggest that our transaction had caused a conflict because it had started many hours ago, which of course it hadn't.
Usually, the dates are reliable -- unless your system has severe memory problems...
The strange time-travel commit errors seemed very deterministic while they were happening, including continuing to happen after restarting servers (until the Data.fs was truncated).
.... But now, the server has gone into a catatonic state just after a restart, at least twice. In this state, nothing appears in the Z2 or event logs, and requests for pages appear to mostly time out (although apache, for whatever reason, served up empty responses with a 200 ok code for some Zope requests during this time).
Startup problems (these are those where nothing appears in the event log file) are best analysed via:
export EVENT_LOG_FILE=<logfile name> bin/runzope
You should see all log messages in the file identified by "<logfile name>" -- even those that are usually suppressed (for an arcane feature).
Thanks - this looks useful. Of course, now I've posted to the list the server is behaving itself, but if/when this happens again I'll surely try this.
The server sits in this dead state until it is restarted. The last time it happened, it would not restart correctly until zeo had also been restarted.
There is a HowTo about "Debugging a spinning Zope". It may help you to analyse the problem.
And I'll try this too.
However, from your description, I would not trust your server. Maybe, there is some hardware problem.
We, too, run busy Zope sites and have not seen any of your problems.
The server has been very reliable up to now. A hardware problem is possible, but it feels more like there is a repeatable problem here just beyond my understanding. If I find out any more, I'll let you know. Thanks, Malcolm. -- [] j a m k i t web solutions for charities malcolm cleaton T: 020 7549 0520 F: 020 7490 1152 M: 07986 563852 W: www.jamkit.com
Malcolm Cleaton wrote at 2004-12-20 10:19 +0000:
... The strange time-travel commit errors seemed very deterministic while they were happening, including continuing to happen after restarting servers (until the Data.fs was truncated).
This would indicate that in your "Data.fs" transactions with older timestamps follow such with younger timestamps. This may happen when your clock jumps into the past. -- Dieter
participants (6)
-
Andreas Jung -
Dieter Maurer -
Florent Guillaume -
Jake -
Malcolm Cleaton -
Paul Winkler