Pavlos Christoforou wrote:
On Fri, 24 Mar 2000, Michel Pelletier wrote:
As soon as I'm able to collect more info I'll forward it to you. Is there anywhere else I should be posting this information?
The list. just keep ccing me.
Some good news at last ...
When I set DEBUG in asyncore.py to 1 so I could view the lists going into select, ZServer stabilised and hasn't crashed since. Smells like a race condition and somehow the extra time it takes to print the list contents stabilises things.
This might be a seperate problem than The Mysterious Segment Violation. Can race conditions cause SEG faults? , I guess they can like any other piece of code. but I would expect a race condition to just spin the process. Can someone who I've spoken with about their SIGSEV problem reproduce Pavlos' cure?
Wild idea: Do you ever get errors when you try to compile a big, big program, specifically the Linux kernel? It seems as if a large number of seemingly perfectly functioning PCs have memory errors that only show up under specific, not well known access patterns. Compiling a big program is a process that can exhibit these error prone patterns[1][2]. Perhaps Zope under load causes the same access patterns? [1] This is not an urban legend. I have seen it. [2] Search for gcc and signal-11 or sigsegv. There are a couple of web pages out there about this. -- cary (probably wrong)
Still I cannot understand how the child process causes the supevising (zdaemon) process to die too.
This makes me think it's a different problem also. I get the feeling you should be able to reproduce this problem on a fresh checkout on your platform since it's so low level. Can you check that?
-Michel
--__--__--