Pavlos Christoforou wrote:
On Fri, 24 Mar 2000, Michel Pelletier wrote:
As soon as I'm able to collect more info I'll forward it to you. Is there anywhere else I should be posting this information?
The list. just keep ccing me.
Some good news at last ...
When I set DEBUG in asyncore.py to 1 so I could view the lists going into select, ZServer stabilised and hasn't crashed since. Smells like a race condition and somehow the extra time it takes to print the list contents stabilises things.
This might be a seperate problem than The Mysterious Segment Violation. Can race conditions cause SEG faults? , I guess they can like any other piece of code. but I would expect a race condition to just spin the process. Can someone who I've spoken with about their SIGSEV problem reproduce Pavlos' cure?
Wild idea: Do you ever get errors when you try to compile a big, big program, specifically the Linux kernel? It seems as if a large number of seemingly perfectly functioning PCs have memory errors that only show up under specific, not well known access patterns. Compiling a big program is a process that can exhibit these error prone patterns[1][2]. Perhaps Zope under load causes the same access patterns? [1] This is not an urban legend. I have seen it. [2] Search for gcc and signal-11 or sigsegv. There are a couple of web pages out there about this. -- cary (probably wrong)
Still I cannot understand how the child process causes the supevising (zdaemon) process to die too.
This makes me think it's a different problem also. I get the feeling you should be able to reproduce this problem on a fresh checkout on your platform since it's so low level. Can you check that?
-Michel
--__--__--
On Sun, Mar 26, 2000 at 10:02:59AM -0500, Cary O'Brien wrote:
Wild idea: Do you ever get errors when you try to compile a big, big program, specifically the Linux kernel? It seems as if a large number of seemingly perfectly functioning PCs have memory errors that only show up under specific, not well known access patterns. Compiling a big program is a process that can exhibit these error prone patterns[1][2]. Perhaps Zope under load causes the same access patterns?
[1] This is not an urban legend. I have seen it. [2] Search for gcc and signal-11 or sigsegv. There are a couple of web pages out there about this.
I get them constantly when recompiling Mozilla, which is a daily task almost. And bad_slab_magic, and random segv's. I'll be running a good memory stress test soon, to see if I can pinpoint the offending Mem. bank. -- Martijn Pieters | Software Engineer mailto:mj@digicool.com | Digital Creations http://www.digicool.com/ | Creators of Zope http://www.zope.org/ | The Open Source Web Application Server ---------------------------------------------
Not convinced that's it tho. Side-note: I've patched ZServer/medusa/asynchat.py line 255 - the 'first' method - to check if self.list actually has anything on it before returning: def first (self): if self.list: return self.list[0] else: return None Since I did that, I haven't been able to reproduce the crash. I doubt that's the problem as such - I'm still leaning toward a race condition somewhere - but the fact that it hasn't crashed since then is curious... Oh, and strace'ing and adding debug and so forth seems to cause the problem to display subtly different behaviour - which makes me unhappy... KevinL (I found that after getting a traceback while running under 'python -i')
Martijn Pieters wrote On Sun, Mar 26, 2000 at 10:02:59AM -0500, Cary O'Brien wrote: Wild idea: Do you ever get errors when you try to compile a big, big program, specifically the Linux kernel? It seems as if a large number of seemingly perfectly functioning PCs have memory errors that only show up under specific, not well known access patterns. Compiling a big program is a process that can exhibit these error prone patterns[1][2]. Perhaps Zope under load causes the same access patterns?
[1] This is not an urban legend. I have seen it. [2] Search for gcc and signal-11 or sigsegv. There are a couple of web pages out there about this.
I get them constantly when recompiling Mozilla, which is a daily task almost. And bad_slab_magic, and random segv's.
I'll be running a good memory stress test soon, to see if I can pinpoint the offending Mem. bank.
-- Martijn Pieters | Software Engineer mailto:mj@digicool.com | Digital Creations http://www.digicool.com/ | Creators of Zope http://www.zope.org/ | The Open Source Web Application Server ---------------------------------------------
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
--------------- qnevhf@obsu.arg.nh --------------- Kevin Littlejohn, Technical Architect, Connect.com.au Don't let the Govt censor our access to the 'net - http://www.efa.org.au/Campaigns/stop.html
participants (3)
-
Cary O'Brien -
Kevin Littlejohn -
Martijn Pieters