Re: zope_msg.log Message
Well, I just moved my zope to a Solaris box rebuilt with gcc. I'm now able to CONSISTENTLY get this thing to crash. What i do is point a couple of wgets at the box to recursively snarf up pages. This doesn't seem to cause any problems (yet) Then with my browser, I try and load up some pages. Usually, within four or five attempts at loading pages, BAMM zope starts eating up CPU cycles and within two minutes zope crashes... And this is what the debugger tells me: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 7 (LWP 3)] Program received signal SIGSEGV, Segmentation fault. 0xef51d2b0 in ExtensionClass_FindInstanceAttribute (inst=0x1c6ff50, oname=0x3e6690, name=0x3e66a4 "_View_Permission") at ./../Components/ExtensionClass/ExtensionClass.c:1603 1603 if (! name) return NULL; (gdb) where #0 0xef51d2b0 in ExtensionClass_FindInstanceAttribute (inst=0x1c6ff50, oname=0x3e6690, name=0x3e66a4 "_View_Permission") at ./../Components/ExtensionClass/ExtensionClass.c:1603 Cannot access memory at address 0xeee05fac. (gdb) info threads * 14 Thread 7 (LWP 3) 0xef51d2b0 in ExtensionClass_FindInstanceAttribute ( inst=0x1c6ff50, oname=0x3e6690, name=0x3e66a4 "_View_Permission") at ./../Components/ExtensionClass/ExtensionClass.c:1603 13 Thread 6 0xef5b9788 in _lwp_sema_wait () 12 Thread 5 (LWP 0) 0xef737ae4 in _swtch () 11 Thread 4 (LWP 0) 0xef737ae4 in _swtch () 10 Thread 3 0xef737c14 in _swtch () 9 Thread 2 (LWP 2) 0xef5b98d0 in __signotifywait () 8 Thread 1 (LWP 1) 0xef5b7400 in poll () 7 LWP 7 0xef5b699c in door_restart () 6 LWP 6 0xef5b9788 in _lwp_sema_wait () 5 LWP 5 0xef5b9788 in _lwp_sema_wait () 4 LWP 4 0xef5b9788 in _lwp_sema_wait () 3 LWP 3 0xef51d2b0 in ExtensionClass_FindInstanceAttribute ( inst=0x1c6ff50, oname=0x3e6690, name=0x3e66a4 "_View_Permission") at ./../Components/ExtensionClass/ExtensionClass.c:1603 2 LWP 2 0xef5b98d0 in __signotifywait () 1 LWP 1 0xef5b7400 in poll () (gdb) Anybody think this is related to the Linux problem? -Jon "Dr. Ross Lazarus" <rossl@med.usyd.edu.au> writes:
Made no difference here either.
In desperation, I moved the server behind an apache using fastcgi. Same problem persists...random crashes with aaiiieee 11 and 256 in the debug log, often with core dumps.
May be time to move away from the Lintel box (stock redhat 6.1, 2.16 src install, pII350, 256mB ram) - I don't see this on a sun box I'm also running.
Does this happen with binary installs ? Is it just redhat 6.1 (we didn't see this on 5.2!)? One problem is that the zmonitor usually works - the server restarts itself and users may only notice a long delay. I see it because I'm watching the logs anxiously.
jon prettyman <jprettyma-@acm.org> wrote: original article:http://www.egroups.com/group/zope/?start=27180
Setting DEBUG to 1 had no affect on my server. Crashed within 15 minutes of setting it.
-Jon
Pavlos Christoforou <pavlos@gaaros.com> writes:
On Fri, 24 Mar 2000, Michel Pelletier wrote:
As soon as I'm able to collect more info I'll forward it to you. Is there anywhere else I should be posting this information?
The list. just keep ccing me.
Some good news at last ...
When I set DEBUG in asyncore.py to 1 so I could view the lists going into select, ZServer stabilised and hasn't crashed since. Smells like a race condition and somehow the extra time it takes to print the list contents stabilises things.
--
Dr Ross Lazarus Associate Professor and Sub-Dean for Information Technology Faculty of Medicine, Room 126A, A27, University of Sydney, Camperdown, NSW 2006, Australia Tel: (+61 2) 93514429 Mobile: +61414872482 Fax: (+61 2) 93516646 Email: rossl@med.usyd.edu.au
On 27 Mar 2000, Jon Prettyman wrote:
Well, I just moved my zope to a Solaris box rebuilt with gcc. I'm now able to CONSISTENTLY get this thing to crash.
Anybody think this is related to the Linux problem?
Jon I did a search on the web and found out a few complains about programs compiled with gcc seg faulting. One posting mentioned a faulty implementaion of the pow() call, of course I have *no clue* how or why it causes problems in Zope. One more thing. Can you try disabling multithreading by giving the option -t 1 to z2.py? Good to be able to reproduce the problem. I suppose I can call it progress ... Pavlos
On Mon, 27 Mar 2000, Pavlos Christoforou wrote: :One more thing. :Can you try disabling multithreading by giving the option -t 1 to z2.py? i tried that and lo! no more SIGSEGV's ie. even with high loads (concurrency == 200) the server was stable (though the speed was naturally quite underwhelming). peter. -- _________________________________________________ peter sabaini, mailto: sabaini@niil.at -------------------------------------------------
We're running with a single thread here. 30 hours, no crash yet.... Single threaded operation is noticably sluggish - in fact it just stops if someone starts a long task) but at least it seems stable.... Starting to look like a thread problem for linux zopes ? Anyone else ? Pavlos Christoforou wrote:
On 27 Mar 2000, Jon Prettyman wrote:
Well, I just moved my zope to a Solaris box rebuilt with gcc. I'm now able to CONSISTENTLY get this thing to crash.
Anybody think this is related to the Linux problem?
Jon
I did a search on the web and found out a few complains about programs compiled with gcc seg faulting. One posting mentioned a faulty implementaion of the pow() call, of course I have *no clue* how or why it causes problems in Zope.
One more thing. Can you try disabling multithreading by giving the option -t 1 to z2.py?
Good to be able to reproduce the problem. I suppose I can call it progress ...
Pavlos
-- Dr Ross Lazarus Associate Professor and Sub-Dean for Information Technology Faculty of Medicine, Room 126A, A27, University of Sydney, Camperdown, NSW 2006, Australia Tel: (+61 2) 93514429 Mobile: +61414872482 Fax: (+61 2) 93516646 Email: rossl@med.usyd.edu.au
"Dr. Ross Lazarus" wrote:
We're running with a single thread here.
30 hours, no crash yet....
Single threaded operation is noticably sluggish - in fact it just stops if someone starts a long task) but at least it seems stable....
Starting to look like a thread problem for linux zopes ?
I think it's a thread problem but not anything linux related, or at least I'd need to see proof of that. Wait another day, and try -t 2. See what happens. -Michel
I set t -2 yesterday arvo after what seemed like blissful stability with t -1 crashed this morning under almost zero load - after a 3 hour period of inactivity, when I opened up the management screen. sigh... Houston, we have a serious problem here which seems to go away when we run a single thread. Unfortunately, a single threaded zope is not much use to us here.... ------ 2000-03-31T00:24:22 ERROR(200) zdaemon zdaemon: Fri Mar 31 10:24:22 2000: Aiieee! 5589 exited with error code: 11 ------ 2000-03-31T00:24:22 INFO(0) zdaemon zdaemon: Fri Mar 31 10:24:22 2000: Houston, we have forked ------ 2000-03-31T00:24:22 INFO(0) zdaemon zdaemon: Fri Mar 31 10:24:22 2000: Hi, I just forked off a kid: 7344 ------ 2000-03-31T00:24:22 INFO(0) zdaemon zdaemon: Fri Mar 31 10:24:22 2000: Houston, we have forked ------ 2000-03-31T00:24:31 ERROR(200) zdaemon zdaemon: Fri Mar 31 10:24:31 2000: Aiieee! 7344 exited with error code: 256 ------ Michel Pelletier wrote:
"Dr. Ross Lazarus" wrote:
We're running with a single thread here.
30 hours, no crash yet....
Single threaded operation is noticably sluggish - in fact it just stops if someone starts a long task) but at least it seems stable....
Starting to look like a thread problem for linux zopes ?
I think it's a thread problem but not anything linux related, or at least I'd need to see proof of that.
Wait another day, and try -t 2. See what happens.
-Michel
-- Dr Ross Lazarus Associate Professor and Sub-Dean for Information Technology Faculty of Medicine, Room 126A, A27, University of Sydney, Camperdown, NSW 2006, Australia Tel: (+61 2) 93514429 Mobile: +61414872482 Fax: (+61 2) 93516646 Email: rossl@med.usyd.edu.au
"Dr. Ross Lazarus" wrote:
I set t -2 yesterday arvo after what seemed like blissful stability with t -1
crashed this morning under almost zero load - after a 3 hour period of inactivity, when I opened up the management screen.
I seem get restarts often when going into the management screen on one of my servers. All is dandy until I go to the management screen, then I wait a couple seconds while it restarts. -- In flying I have learned that carelessness and overconfidence are usually far more dangerous than deliberately accepted risks. -- Wilbur Wright in a letter to his father, September 1900
Bill Anderson wrote:
I seem get restarts often when going into the management screen on one of my servers. All is dandy until I go to the management screen, then I wait a couple seconds while it restarts.
We too have seen more frequent restarts centered around /manage activity. I have one user that I can tell when he starts working in the morning by the first Zope restart of the day. We have not been able to get a consistent repeat of the behavior though. We have three servers running with one taking most /manage activity and that machine is the one that restarts the most. -- ------------------------------- tonyr@ep.newtimes.com Director of Web Technology New Times, Inc. -------------------------------
Tony Rossignol wrote:
Bill Anderson wrote:
I seem get restarts often when going into the management screen on one of my servers. All is dandy until I go to the management screen, then I wait a couple seconds while it restarts.
We too have seen more frequent restarts centered around /manage activity. I have one user that I can tell when he starts working in the morning by the first Zope restart of the day. We have not been able to get a consistent repeat of the behavior though.
We have three servers running with one taking most /manage activity and that machine is the one that restarts the most.
Sounds like a place to start looking, then. I had thought it to be isolated, since only one of my sevrers displays this. -- In flying I have learned that carelessness and overconfidence are usually far more dangerous than deliberately accepted risks. -- Wilbur Wright in a letter to his father, September 1900
Bill Anderson wrote:
We too have seen more frequent restarts centered around /manage activity. I have one user that I can tell when he starts working in the morning by the first Zope restart of the day. We have not been able to get a consistent repeat of the behavior though.
We have three servers running with one taking most /manage activity and that machine is the one that restarts the most.
Sounds like a place to start looking, then. I had thought it to be isolated, since only one of my sevrers displays this.
Ah, yes a place to start, but start what? I've been looking at these restarts from all the angles I know of. Not being a low level coder, or a very good python programmer I'm at a loss as to how to go any deeper. My command of the debugger is nil and I've never been able to get strace working on a live and loaded (read candidate for restart) zope process. Is there any confirmation of the glibc version being a factor for these restarts. I seem to remember a thread a while back indicating this as a possible culprit? Our most stable machine is running on a slower older machine that has other duties (so many factors there it's darn near impossible to figure out which one is effecting things). Oh well the search continues. -- ------------------------------- tonyr@ep.newtimes.com Director of Web Technology New Times, Inc. -------------------------------
participants (7)
-
Bill Anderson -
Dr. Ross Lazarus -
Jon Prettyman -
Michel Pelletier -
Pavlos Christoforou -
Peter Sabaini -
Tony Rossignol