Re: [Zope] select.error: (4, 'Interrupted system call')
I am moving this thread over from the zope list in the hope that someone here may have more insight into the nature of this problem, and perhaps how to go about better diagnosing/fixing the problem. To sum up: Under apparently random circumstances and rather mild amounts of traffic, Zope is crashing and printing the Traceback below to the console. This is Zope-2.4.0, Python-2.1.1 on SunOS 5.8/Sparc. Traceback (most recent call last): File "/u01/zope/Zope-2.4.0-src/z2.py", line 774, in ? asyncore.loop() File "/u01/zope//lib/python2.1/asyncore.py", line 194, in loop poll_fun (timeout, map) File "/u01/zope//lib/python2.1/asyncore.py", line 86, in poll r,w,e = select.select (r,w,e, timeout) select.error: (4, 'Interrupted system call') Any ideas? Thanks in advance, Ziniti On Thu, 16 Aug 2001 14:58:48 +0100 "J. Cone" <jcone@g8labs.com> wrote:
Could do.
The old unix semantics say you deliver a signal to a process. After the process has split itself into a bunch of threads, it's unclear which one will get to handle it. I suspect SunOS favours ones that are blocked, which would make sense if they were blocked in sigsuspend, waiting for a signal, but that's not your situation :-)
This conversation may belong on the zope-devel mailing list, where I expect they have people who understand which signals Zope is supposed to accept, and how.
At 09:38 16/08/01 -0400, John Ziniti wrote:
I just have a hunch this problem has something to do with threads, but I don't know why. Does that make any sense?
from $PYTHON_SOURCE/Modules/selectmodule.c:
Py_BEGIN_ALLOW_THREADS n = select(max, &ifdset, &ofdset, &efdset, tvp); Py_END_ALLOW_THREADS
if (n < 0) { PyErr_SetFromErrno(SelectError);
Man, I hate Solaris! I want my Linux box back!
On Thu, 16 Aug 2001 13:56:45 +0100 "J. Cone" <jcone@g8labs.com> wrote:
In my application area, an error like that would mean: - you blocked on a set of file descriptors - either - someone tried to kill you, so you clean up and then exit or - a timer went off so you handle it and then block again
Is it possible to handle signals in python and find out who's sending them?
Is it running in a terminal (could be ^c'd) or as a daemon (could get SIGTERM on change of run-level) ?
Do any other processes on this box incurr stray fingers of G_d?
At 08:40 16/08/01 -0400, Chris McDonough wrote:
Geez. I'm not sure, John. This error is being raised by the OS in the middle of a system call. I don't know enough about Solaris to be able to give you any direction. Perhaps someone else can chime in? Anybody else seen this?
John Ziniti wrote:
This is SunOS 5.8 running on a Sparc.
On Wed, 15 Aug 2001 17:25:10 -0400 "Chris McDonough" <chrism@zope.com> wrote:
THis is an odd error... what OS?
----- Original Message ----- From: "John Ziniti" <jziniti@speakeasy.org> To: <zope@zope.org> Sent: Wednesday, August 15, 2001 5:11 PM Subject: [Zope] select.error: (4, 'Interrupted system call')
>Hey all, > >I've been getting this all day today since I upgraded >to Python 2.1.1 and Zope-2.4.0. It seems to happen >randomly, but mostly when I am moving around the ZMI, >the more I move around, the more likely it is to occur. > >It also brings down the Zope server, requiring a restart >and is thuis rather annoying. Any ideas on causes, fixes, >hunches? The traceback (printed to console) follows. > >Traceback (most recent call last): > File "/u01/zope/Zope-2.4.0-src/z2.py", line 774, in ? > asyncore.loop() > File "/u01/zope//lib/python2.1/asyncore.py", line 194, in loop > poll_fun (timeout, map) > File "/u01/zope//lib/python2.1/asyncore.py", line 86, in poll > r,w,e = select.select (r,w,e, timeout) >select.error: (4, 'Interrupted system call') > > >Thanks in advance, > >Ziniti > > >--
-- John Ziniti Channing Laboratory Brigham and Women's Hospital 181 Longwood Avenue Brookline, MA 02115 john.ziniti@channing.harvard.edu
On Thu, 16 Aug 2001 10:57:12 -0400, John Ziniti <jziniti@speakeasy.org> wrote:
I am moving this thread over from the zope list in the hope that someone here may have more insight into the nature of this problem, and perhaps how to go about better diagnosing/fixing the problem.
To sum up: Under apparently random circumstances and rather mild amounts of traffic, Zope is crashing and printing the Traceback below to the console. This is Zope-2.4.0, Python-2.1.1 on SunOS 5.8/Sparc.
Traceback (most recent call last): File "/u01/zope/Zope-2.4.0-src/z2.py", line 774, in ? asyncore.loop() File "/u01/zope//lib/python2.1/asyncore.py", line 194, in loop poll_fun (timeout, map) File "/u01/zope//lib/python2.1/asyncore.py", line 86, in poll r,w,e = select.select (r,w,e, timeout) select.error: (4, 'Interrupted system call')
Any ideas?
Im not using that version of Zope or Python, but in every version of asyncore I have seen the call to select is wrapped with a try/except that traps EINTR, and retries. The version I happen to be using at the moment looks like: try: r,w,e = select.select (r,w,e, timeout) except select.error, v: if v[0] != EINTR: raise else: break Unless your version is different, that traceback 'shouldnt happen' Toby Dickenson tdickenson@geminidataloggers.com
Thanks, Toby ... my asyncore has no try/except: from $PYTHON_SOURCE/Lib/asyncore.py for fd, obj in map.items(): if obj.readable(): r.append (fd) if obj.writable(): w.append (fd) r,w,e = select.select (r,w,e, timeout) if DEBUG: print r,w,e What's up with that. On Thu, 16 Aug 2001 16:12:35 +0100 Toby Dickenson <tdickenson@devmail.geminidataloggers.co.uk> wrote:
On Thu, 16 Aug 2001 10:57:12 -0400, John Ziniti <jziniti@speakeasy.org> wrote:
I am moving this thread over from the zope list in the hope that someone here may have more insight into the nature of this problem, and perhaps how to go about better diagnosing/fixing the problem.
To sum up: Under apparently random circumstances and rather mild amounts of traffic, Zope is crashing and printing the Traceback below to the console. This is Zope-2.4.0, Python-2.1.1 on SunOS 5.8/Sparc.
Traceback (most recent call last): File "/u01/zope/Zope-2.4.0-src/z2.py", line 774, in ? asyncore.loop() File "/u01/zope//lib/python2.1/asyncore.py", line 194, in loop poll_fun (timeout, map) File "/u01/zope//lib/python2.1/asyncore.py", line 86, in poll r,w,e = select.select (r,w,e, timeout) select.error: (4, 'Interrupted system call')
Any ideas?
Im not using that version of Zope or Python, but in every version of asyncore I have seen the call to select is wrapped with a try/except that traps EINTR, and retries. The version I happen to be using at the moment looks like:
try: r,w,e = select.select (r,w,e, timeout) except select.error, v: if v[0] != EINTR: raise else: break
Unless your version is different, that traceback 'shouldnt happen'
Toby Dickenson tdickenson@geminidataloggers.com
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
-- John Ziniti Channing Laboratory Brigham and Women's Hospital 181 Longwood Avenue Brookline, MA 02115 john.ziniti@channing.harvard.edu
Adding the try/except "helps" ... some print's lead me to belive that the error is actually EWOULDBLOCK and not EINTR. I moved zopeinstancehome and I am starting Zope with a clean Data.fs and no Products. This version seems to be more stable. I'm going to slowly add in Products until I can crash it again and I'll try to post any results later ... On Thu, 16 Aug 2001 11:22:18 -0400 John Ziniti <jziniti@speakeasy.org> wrote:
Thanks, Toby ... my asyncore has no try/except:
from $PYTHON_SOURCE/Lib/asyncore.py
for fd, obj in map.items(): if obj.readable(): r.append (fd) if obj.writable(): w.append (fd) r,w,e = select.select (r,w,e, timeout)
if DEBUG: print r,w,e
What's up with that.
On Thu, 16 Aug 2001 16:12:35 +0100 Toby Dickenson <tdickenson@devmail.geminidataloggers.co.uk> wrote:
On Thu, 16 Aug 2001 10:57:12 -0400, John Ziniti <jziniti@speakeasy.org> wrote:
I am moving this thread over from the zope list in the hope that someone here may have more insight into the nature of this problem, and perhaps how to go about better diagnosing/fixing the problem.
To sum up: Under apparently random circumstances and rather mild amounts of traffic, Zope is crashing and printing the Traceback below to the console. This is Zope-2.4.0, Python-2.1.1 on SunOS 5.8/Sparc.
Traceback (most recent call last): File "/u01/zope/Zope-2.4.0-src/z2.py", line 774, in ? asyncore.loop() File "/u01/zope//lib/python2.1/asyncore.py", line 194, in loop poll_fun (timeout, map) File "/u01/zope//lib/python2.1/asyncore.py", line 86, in poll r,w,e = select.select (r,w,e, timeout) select.error: (4, 'Interrupted system call')
Any ideas?
Im not using that version of Zope or Python, but in every version of asyncore I have seen the call to select is wrapped with a try/except that traps EINTR, and retries. The version I happen to be using at the moment looks like:
try: r,w,e = select.select (r,w,e, timeout) except select.error, v: if v[0] != EINTR: raise else: break
Unless your version is different, that traceback 'shouldnt happen'
Toby Dickenson tdickenson@geminidataloggers.com
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
-- John Ziniti Channing Laboratory Brigham and Women's Hospital 181 Longwood Avenue Brookline, MA 02115 john.ziniti@channing.harvard.edu
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
-- John Ziniti Channing Laboratory Brigham and Women's Hospital 181 Longwood Avenue Brookline, MA 02115 john.ziniti@channing.harvard.edu
On Thu, 16 Aug 2001 12:34:58 -0400, John Ziniti <jziniti@speakeasy.org> wrote:
I moved zopeinstancehome and I am starting Zope with a clean Data.fs and no Products. This version seems to be more stable. I'm going to slowly add in Products until I can crash it again and I'll try to post any results later ...
Zope isnt 'crashing' here.... EINTR is an error that select can return under normal operation. The catch-and-retry is what asyncore *should* be doing - not a hack to cover up the problem. Ive been doing some CVS archeology.... it looks like this was fixed in Zope's own version of asyncore.py in revision 1.13 by jim, and I guess the fix never made it into the standard python version (I havent checked, but I assume thats what you are using...) John; Could you put this in the collector....
Adding the try/except "helps" ... some print's lead me to belive that the error is actually EWOULDBLOCK and not EINTR.
Hmmmm. I dont think you we expect that from select. Any posix gurus listening? Toby Dickenson tdickenson@geminidataloggers.com
John; Could you put this in the collector....
Sure
Zope isnt 'crashing' here.... EINTR is an error that select can return under normal operation. The catch-and-retry is what asyncore *should* be doing - not a hack to cover up the problem.
Hmmmm. I dont think you we expect that from select. Any posix gurus listening?
Let me apologize and clarify. Zope *does* crash if there is no try/except around the select, at least in the way that I think of crashing. It needs to restart itself in any case. I am pretty sure EINTR *is* the problem, but when I add the try, it seems that a later call to accept() throws EWOULDBLOCK (??). This doesn't crash Zope, but brings down the FCGI and PCGI Servers. See the trace below. I added the print 'system.error' line inside the except block. system.error 4((4, 'Interrupted system call')) ------ 2001-08-16T13:16:01 ERROR(200) ZServer uncaptured python exception, closing channel <__repr__ (self) failed for object at dee7bc (addr='/tmp/zope.soc')> (exceptions.TypeError:unpack non-sequence [/u01/zope/lib/python2.1/asyncore.py|poll|101] [/u01/zope/lib/python2.1/asyncore.py|handle_read_event|383] [/u01/zope/Zope-2.4.0-src/ZServer/FCGIServer.py|handle_accept|697]) ------ 2001-08-16T13:16:01 ERROR(200) ZServer uncaptured python exception, closing channel <__repr__ (self) failed for object at dec4bc (addr='/u01/zope/Zope-2.4.0-src/var/pcgi.soc')> (exceptions.TypeError:unpack non-sequence [/u01/zope/lib/python2.1/asyncore.py|poll|101] [/u01/zope/lib/python2.1/asyncore.py|handle_read_event|383] [/u01/zope/Zope-2.4.0-src/ZServer/PCGIServer.py|handle_accept|380]) ------ 2001-08-16T13:16:01 ERROR(200) ZServer uncaptured python exception, closing channel <FTPServer listening :8021 at debcec> (exceptions.TypeError:unpack non-sequence [/u01/zope/lib/python2.1/asyncore.py|poll|101] [/u01/zope/lib/python2.1/asyncore.py|handle_read_event|383] [/u01/zope/Zope-2.4.0-src/ZServer/FTPServer.py|handle_accept|694]) ------ 2001-08-16T13:16:01 PROBLEM(100) ZServer warning: server accept() threw EWOULDBLOCK ------ 2001-08-16T13:16:01 ERROR(200) ZServer uncaptured python exception, closing channel <select-trigger (pipe) at 248024> (exceptions.OSError:[Errno 11] Resource temporarily unavailable [/u01/zope/lib/python2.1/asyncore.py|poll|101] [/u01/zope/lib/python2.1/asyncore.py|handle_read_event|389] [/u01/zope/Zope-2.4.0-src/ZServer/medusa/thread/select_trigger.py|handle_read|77] [/u01/zope/lib/python2.1/asyncore.py|recv|341] [/u01/zope/lib/python2.1/asyncore.py|recv|523]) -- John Ziniti Channing Laboratory Brigham and Women's Hospital 181 Longwood Avenue Brookline, MA 02115 john.ziniti@channing.harvard.edu
"JZ" == John Ziniti <jziniti@speakeasy.org> writes:
John; Could you put this in the collector.... JZ> Sure
I'm following up here, perhaps a little late. John also reported this bug in the Python bug tracker at SF. I think he's right that asyncore should catch EINTR and retry. The right thing, in probably every case, is to retry the select. Thus, asyncore should be doing it and not burdening every application (like Zope) with the need to add a try/except. JZ> Let me apologize and clarify. Zope *does* crash if there is no JZ> try/except around the select, at least in the way that I think JZ> of crashing. It needs to restart itself in any case. Right. JZ> I am pretty sure EINTR *is* the problem, but when I add the try, JZ> it seems that a later call to accept() throws EWOULDBLOCK (??). JZ> This doesn't crash Zope, but brings down the FCGI and PCGI JZ> Servers. Did you retry the select() or did you do something else? JZ> See the trace below. I added the print 'system.error' line JZ> inside the except block. The last error in your traceback is quite interesting: JZ> ------ 2001-08-16T13:16:01 PROBLEM(100) ZServer warning: server JZ> accept() threw EWOULDBLOCK ------ 2001-08-16T13:16:01 ERROR(200) JZ> ZServer uncaptured python exception, closing channel JZ> <select-trigger (pipe) at 248024> (exceptions.OSError:[Errno 11] JZ> Resource temporarily unavailable JZ> [/u01/zope/lib/python2.1/asyncore.py|poll|101] JZ> [/u01/zope/lib/python2.1/asyncore.py|handle_read_event|389] JZ> [/u01/zope/Zope-2.4.0-src/ZServer/medusa/thread/select_trigger.py|handle_read|77] JZ> [/u01/zope/lib/python2.1/asyncore.py|recv|341] JZ> [/u01/zope/lib/python2.1/asyncore.py|recv|523]) John Heintz has been seeing a similar problem with the latest beta of ZEO 1.0. It doesn't really make sense that select() reports the socket is ready for reading, but recv() fails with EWOULDBLOCK. Is it easy to reproduce this error? Could you isolate a test case that I could run locally? Also, what platform are you running on? Jeremy
bug in the Python bug tracker at SF. I think he's right that asyncore should catch EINTR and retry. The right thing, in probably every case, is to retry the select. Thus, asyncore should be doing it and not burdening every application (like Zope) with the need to add a try/except.
I actually modified asyncore.py in Python :), not in Zope/medusa sources.
Did you retry the select() or did you do something else?
Orginally, I failed to understand what I needed to do. In my first attempt, I just caught EINTR and pretended nothing went wrong :) I think this is what caused the EWOULBLOCK problems. My second version (simplifed by Toby) loops until we don't catch EINTR. This one has been working very well for three days.
Is it easy to reproduce this error? Could you isolate a test case that I could run locally? Also, what platform are you running on?
The most reliable way I've found of getting it is to install the DCO2 Product for Oracle and try to create a DB Connection. That failed with this error every single time -- but that may not be an option for you. Other than that, the errors were unpredictable, but pretty common just by clicking around the ZMI. Platform is SunOS 5.8/Sparc with Python-2.1.1 and Zope-2.4.0 -- John Ziniti Channing Laboratory Brigham and Women's Hospital 181 Longwood Avenue Brookline, MA 02115 john.ziniti@channing.harvard.edu
participants (3)
-
Jeremy Hylton -
John Ziniti -
Toby Dickenson