[ZCM] [ZC] 1939/ 4 Comment "Unreliable restart under Windows"

Collector: Zope Bugs, Features, and Patches ... zope-coders-admin at zope.org
Fri Dec 16 05:34:19 EST 2005


Issue #1939 Update (Comment) "Unreliable restart under Windows"
 Status Pending, Zope/bug medium
To followup, visit:
  http://www.zope.org/Collectors/Zope/1939

==============================================================
= Comment - Entry #4 by hamannu on Dec 16, 2005 5:34 am

as i noticed the problem primarily, i have analyzed the error once again with the following result:

1. we have in fact the current version of service.py, the _dolog method looks like:

    def _dolog(self, func, msg):
        try:
            fullmsg = "%s (%s): %s" % \
                      (self._svc_name_, self._svc_display_name_, msg)
            func(fullmsg)
        except win32api.error, details:
            # Failed to write a log entry - most likely problem is
            # that the event log is full.  We don't want this to kill us
            print "FAILED to write event log entry:", details
            print msg

2. you can reproduce the error by doing the following:
	- within the run method fake a warning message with size >= 32k, for example

	def run:
            .....

            if status != 0:
                # This should never block - the child process terminating
                # has closed the redirection pipe, so our thread dies.
                
                self.redirect_thread.join(5)
                if self.redirect_thread.isAlive():
                    self.warning("Redirect thread did not stop!")
                    
                self.warning('1' * (1024 * 32 +1))  // here is the fake

                ....

       - restart the service (zmi->control panel->restart)

       then the service doesn't restart, it is down and you have to start the service manually by system control!

3. in fact the win32api.error is raised, but the problem seems to be the exception handler within the _dolog method, look at:

        except win32api.error, details:
            # Failed to write a log entry - most likely problem is
            # that the event log is full.  We don't want this to kill us
            print "FAILED to write event log entry:", details
            print msg

here the following error is raised and never caugth:

	Traceback (most recent call last):
	
	  File "C:\UHH\Development\iDesk\service\HR2Server\iDeskService\Zope\Base\lib\python\nt_svcutils\service.py", line 129, in _dolog
	    print msg
	
	IOError: [Errno 9] Bad file descriptor


________________________________________
= Comment - Entry #3 by mhammond on Dec 15, 2005 6:37 pm

I should also point out that I believe dropping that number will not make things much more reliable.  MS documents the limit for a string as 32k, and we are under that.

So I guess that your event log was very nearly full, and that the error you saw could happen in some cases when trying to write a much smaller size.

[However, I do still agree that 16k is too much - the intent is to only show enough to diagnose the fatal error - it is very unlikely we need 16k of data for that.  On the other hand, I could see large tracebacks being over 2k in total.  4k may be better.]
________________________________________
= Comment - Entry #2 by mhammond on Dec 15, 2005 6:26 pm

In principal I agree - the current code could write 16k to the log, which is too much.

However, I'm surprised that it would fail.  The current version of the code should have a version of warning that catches failure to write to the log and ignores it.

Could you please check that your version of the _dolog() function in nt_svcutils\service.py catches win32api.error exceptions?  If not, then you probably have an older versions, and that's fine.  If so, could you please let me know exactly what exception you were seeing in this case?

Either way, I think the change you propose is reasonable.
________________________________________
= Request - Entry #1 by d.maurer on Nov 3, 2005 5:30 am

The call "self.warning("process terminated with exit code %d.\n%s" % (status, "".join(self.captured_blocks)))" in "nt_service.service.Service.run" may transfer too much information for the Windows event service, resulting in an exception. If such an exception occurs, the automatic restart fails.

We reduced the "CHILDCAPTURE_MAX_BLOCKS" from 200 to 10 to work around this problem
==============================================================



More information about the Zope-Collector-Monitor mailing list