I propose: Each time the child process terminates with a non-zero return code, the tail x-bytes of the child output be written to the Windows event log, where x~2k.
This is a good idea. FWIW, I believe the Zope HEAD already has some work done towards this (in lib/python/nt_svcutils/service.py), although the child output goes to a logfile instead of the event log. It would be nice to make the output go to the event log and then backport this to 2.7.
I started from HEAD, and indeed did find a good attempt at making this work. However, it was disabled by default, it writes to a file, and had issues with blocking reads. I fixed this to capure the output in memory (but not all of it - just the tail), and to use a single pipe for both stdout and stderr.
2) reporting of "successful start" and "backoff" strategy. A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service to hopelessly retry for a number of minutes, and not respond to shutdown requests during a retry.
Yup. The reason it retry-restarts is because it's simple and stupid and the reason it doesn't respond to shutdown requests during a retry is because the service code sleeps for the backoff interval after an unsuccessful startup. Any async requests that happen in the meantime are blocked waiting for this sleep to end. I'm not quite sure how to do that better.
FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent, timeout_period*1000).
Note that the Zope Python install also has a sitecustomize.py that munges sys.path in order to get things set up properly.
Right! That is the magic I was referring to and had not yet found.
Others have claimed this is unnecessary and that the work that gets done in there could be done in the service code. It's a bit of a mess.
I agree with the others :) mkzopeinstance goes to lengths to provide all relevent information, and the service code does not take full advantage of that. It should be possible for Zope to work with a standard, external Python/pywin32 installation. I'm not suggesting this become a distribution option, but still a worthy goal; for developers, and to keep us honest <wink>. Note that with a few tweaks, Zope *does* build and work with an external Python/pywin32.
At one point I flailed trying to make the child process inherit its environment from the parent, and plastered over the problem with various sys.path and PYTHONPATH and other environment variable settings. The current situation is a result. Some guidance here would be helpful.
The child process *does* inherit the parent, service environment. Hence adding os.environ[] entries in the service does set them for subsequently created children. ie, setting os.environ["PYTHONPATH"]=SOFTWARE_HOME in the service main code appears to avoid the sitecustomize.py requirement - the child process *does* see the new PYTHONPATH.
I'm a Windows signal idiot.
No - Windows is a signal idiot :)
Is there a way that we can make the Zope process capture Windows signals and when the Windows equivalent of SIGTERM is sent to the process to shut it down "cleanly"? This is how it works on UNIX
That makes sense, but:
but we circumvent trying to listen for signals on Windows entirely at startup.
Can you explain the above? Do you mean that on Windows, you take no special signal actions, as demonstrated in WindowsZopeStarter.registerSignals? I'll see what I can come up with though.
Note that the UNIX environment has a lot of additional niceties due to responses to signals (like logfile rotation) that Windows doesn't now, which tends to have the effect of relegating Windows to a second-class platform on which to run a production Zope instance.
Windows certainly has these features available - they are just not always spelt the same as they are on Unix. Sometimes they are even better <wink>. So there seems to be a chicken-and-egg problem - users will always consider it second class until Zope itself starts considering it first class. This is an observation, not a critisism :) Mark.