Hang while waiting for external process
I'm using zope 2.4.3 on linux (redhat 7 with kernel 2.4.14), playing around with the Photo product (nice one!). I've had a few zope freezes, and finally found a pattern: Occasionally, if using ImageMagick to do the rendering for Photo, convert will fail to complete and Zope will then become completely unresponsive. Browser requests do nothing. This is weird. I can understand how one of the Zope threads might get stuck waiting for the never-ending external process; but why should Zope become completely unresponsive? Shouldn't the other threads keep responding? And is there a way to set a timeout for an external process, such that zope can recover if it fails to return for too long? Here's output of ps showing the defunct convert process: (snip) 891 pts/0 S 0:00 python2.1 /home/pw/Downloads/Zope/z2.py -M /home/pw/C 896 pts/0 S 0:09 \_ /usr/local/bin/python2.1 /home/pw/Downloads/Zope/ 919 pts/0 S 0:00 \_ /usr/local/bin/python2.1 /home/pw/Downloads/Z 920 pts/0 S 0:29 \_ /usr/local/bin/python2.1 /home/pw/Downloa 1001 pts/0 Z 0:00 | \_ [convert <defunct>] 921 pts/0 S 0:05 \_ /usr/local/bin/python2.1 /home/pw/Downloa 922 pts/0 S 0:00 \_ /usr/local/bin/python2.1 /home/pw/Downloa 923 pts/0 S 0:00 \_ /usr/local/bin/python2.1 /home/pw/Downloa I can then get zope to restart by killing any of the zope PIDs. Here's the big M log leading up to the freeze - nothing unusual:... requestprofiler.py doesn't show any hangs in this log... $ tail -f bigM E 147423764 2001-11-15T17:36:05 B 143413564 2001-11-15T17:36:16 GET /test_crap/my_image_library/nothing I 143413564 2001-11-15T17:36:16 0 A 143413564 2001-11-15T17:36:16 500 2485 E 143413564 2001-11-15T17:36:16 B 147432596 2001-11-15T17:36:25 GET /test_crap/my_image_library/lilaq-logo.jpg I 147432596 2001-11-15T17:36:25 0 A 147432596 2001-11-15T17:36:25 304 282 E 147432596 2001-11-15T17:36:25 -- paul winkler home: http://www.slinkp.com music: http://www.reacharms.com calendars: http://www.calendargalaxy.com
Paul Winkler writes:
I'm using zope 2.4.3 on linux (redhat 7 with kernel 2.4.14), playing around with the Photo product (nice one!). I've had a few zope freezes, and finally found a pattern: Occasionally, if using ImageMagick to do the rendering for Photo, convert will fail to complete and Zope will then become completely unresponsive. Browser requests do nothing.
This is weird. I can understand how one of the Zope threads might get stuck waiting for the never-ending external process; but why should Zope become completely unresponsive? Shouldn't the other threads keep responding? They should (and usually do). Maybe a problem in Python: forgets to release the Global Interpreter Lock for the system call you use to start your process....
And is there a way to set a timeout for an external process, such that zope can recover if it fails to return for too long? If you use "os.system", then I fear there is no safe way. For "os.popen" (or variants thereof), you might be able to use "select" (with a timeout) to check whether the partner closed the file descriptor. If so, you abandon the process (without closing the file, neither explicitly nor implicitly!, i.e. must go into a global variable, preferably a list.). Of course, you will leak file descriptors in this case.
Dieter
On Fri, Nov 16, 2001 at 06:25:44PM +0100, Dieter Maurer wrote:
Paul Winkler writes:
This is weird. I can understand how one of the Zope threads might get stuck waiting for the never-ending external process; but why should Zope become completely unresponsive? Shouldn't the other threads keep responding? They should (and usually do). Maybe a problem in Python: forgets to release the Global Interpreter Lock for the system call you use to start your process....
And is there a way to set a timeout for an external process, such that zope can recover if it fails to return for too long? If you use "os.system", then I fear there is no safe way. For "os.popen" (or variants thereof), you might be able to use "select" (with a timeout) to check whether the partner closed the file descriptor. If so, you abandon the process (without closing the file, neither explicitly nor implicitly!, i.e. must go into a global variable, preferably a list.). Of course, you will leak file descriptors in this case.
Thanks for the tip. I'm forwarding to Ron Bickers, author of Photo (the product I'm using that calls the external process), to make sure he sees this. It uses os.popen but doesn't check for failure. Unfortunately I'm now starting to doubt my diagnosis... had a couple more freezes that seemed unrelated to calls to convert. I'll try to keep bigM logs of every freeze, keep a close watch on running processes to see when convert dies and see if that's actually related at all. There always seems to be a defunct convert hanging around when I get a freeze, but I'm not sure that's actually the cause. -- paul winkler home: http://www.slinkp.com music: http://www.reacharms.com calendars: http://www.calendargalaxy.com
On Fri, Nov 16, 2001 at 02:30:59PM -0500, Paul Winkler wrote: (snip)
Unfortunately I'm now starting to doubt my diagnosis... had a couple more freezes that seemed unrelated to calls to convert.
OK, this is interesting. Sorry for the false alarm - it turns out Zope is not frozen at all, and it doesn't seem to have to do with Photo or convert per se. Instead, I think I've found a bug in Netscape 6. In Netscape 6.1 under Linux (redhat 7 upgraded to kernel 2.4.14), I can reliably do the following: 1. go to any page Image in zope and view it. Alternatively, go to any page *containing* an image (although small ones like the Zope logo seem to be OK for some reason???) 2. hit "reload" repeatedly. On the 9th reload (why always the 9th???), the reload never completes. (I haven't had the patience yet to leave it running overnight.) If I hit "stop", and then try to load any other page from this Zope server, I get nothing. Netscape can still browse any other server or local file, but gets no response from Zope. This is why I thought Zope was frozen. I can do this on any Image or Photo... doesn't matter which one. BUT Zope still responds to requests from other clients! I should have checked that before. I can access zope from lynx, links, or even other instances of Netscape 6. I fired up a python session and used urllib to access the same image my netscape was stuck on... put it in a "for" loop and ran it 1000 times, no problem (damn it's fast!). Watched the bigM log (with tail -f) ... it's responding all right. All seems OK except for the one instance of Netscape that can't seem to get in to zope. Watching the bigM log shows that zope is simply not getting any requests from that Netscape any more. You try to connect to any URL at this Zope server, and quite simply nothing happens. But all I have to do is restart Netscape, not zope! Very weird... but not nearly as scary as zope freezing! :) Later, I'll put netscape 4.7x on this machine and try that too... -- paul winkler home: http://www.slinkp.com music: http://www.reacharms.com calendars: http://www.calendargalaxy.com
On Fri, Nov 16, 2001 at 09:31:58PM -0500, Paul Winkler wrote:
Later, I'll put netscape 4.7x on this machine and try that too...
No problems with Netscape 4.75. Problem is 100% repeatable AKAIKT with Netscape 6.1. So it's almost definitely a mozilla bug... -- paul winkler home: http://www.slinkp.com music: http://www.reacharms.com calendars: http://www.calendargalaxy.com
Hi Paul, I experience the same problem on windows 98: Netscape 6.1 at some point just stops reloading pages from Zope, while other clients keep on processing Zope requests correctly. However, I still can reach other websites, outside Zope. Restarting Netscape "solves" the problem. Don't know what the problem is... Greetings, Antwan. At 21:31 16-11-01 -0500, you wrote: On Fri, Nov 16, 2001 at 02:30:59PM -0500, Paul Winkler wrote: (snip)
Unfortunately I'm now starting to doubt my diagnosis... had a couple more freezes that seemed unrelated to calls to convert.
OK, this is interesting. Sorry for the false alarm - it turns out Zope is not frozen at all, and it doesn't seem to have to do with Photo or convert per se. Instead, I think I've found a bug in Netscape 6. In Netscape 6.1 under Linux (redhat 7 upgraded to kernel 2.4.14), I can reliably do the following: 1. go to any page Image in zope and view it. Alternatively, go to any page *containing* an image (although small ones like the Zope logo seem to be OK for some reason???) 2. hit "reload" repeatedly. On the 9th reload (why always the 9th???), the reload never completes. (I haven't had the patience yet to leave it running overnight.) If I hit "stop", and then try to load any other page from this Zope server, I get nothing. Netscape can still browse any other server or local file, but gets no response from Zope. This is why I thought Zope was frozen. I can do this on any Image or Photo... doesn't matter which one. BUT Zope still responds to requests from other clients! I should have checked that before. I can access zope from lynx, links, or even other instances of Netscape 6. I fired up a python session and used urllib to access the same image my netscape was stuck on... put it in a "for" loop and ran it 1000 times, no problem (damn it's fast!). Watched the bigM log (with tail -f) ... it's responding all right. All seems OK except for the one instance of Netscape that can't seem to get in to zope. Watching the bigM log shows that zope is simply not getting any requests from that Netscape any more. You try to connect to any URL at this Zope server, and quite simply nothing happens. But all I have to do is restart Netscape, not zope! Very weird... but not nearly as scary as zope freezing! :) Later, I'll put netscape 4.7x on this machine and try that too... -- paul winkler home: http://www.slinkp.com music: http://www.reacharms.com calendars: http://www.calendargalaxy.com _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
participants (3)
-
Antwan Reijnen -
Dieter Maurer -
Paul Winkler