[Zope-dev] Possible Windows Service improvements.

Mark Hammond mhammond at skippinet.com.au
Wed Aug 4 07:35:20 EDT 2004


Hi all,
  I am starting to venture into the wonderful world of Zope!  With the
benefit of a complete lack of Zope experience, I have been able to look at
the Windows service support from a fairly clean slate.  However, I also
realize this lack of experience means my ideas may be naive - hence I have
attempted to split them into discrete issues for discrete rejection <wink>.

1) startup error redirection.
I've noticed that the main Zope service driver for Windows seems to work
fine when everything is setup correctly, but when things go wrong it offers
no clues as to what.  This is reflected in collector item 1020 ("poor error
reporting on product initialisation failure under windows").  Issue 1408
("Configuration file imports don't see INSTANCE_HOME when running Zope as a
windows service"), via the referenced thread, has evidence of someone
burning a day due to this.  It cost me alot of time too :)

I propose:
Each time the child process terminates with a non-zero return code, the tail
x-bytes of the child output be written to the Windows event log, where x~2k.

2) reporting of "successful start" and "backoff" strategy.
A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service
to hopelessly retry for a number of minutes, and not respond to shutdown
requests during a retry.

At the moment, as soon as the service starts it reports "successful startup"
to Windows.  It then begins an attempt to start the child.  If the child
immediately fails, the code immediately begins the "backoff" strategy.  This
strategy appears to have 2 main purposes:
* Startup may fail due to other 'services' not having yet started, so retry
in the hope they become available.
* The process may die due to some obscure error - restart it.

On windows, assuming we install the service to depend on the "tcpip"
service, I see no reason that the first reason is valid.  If the process
fails quickly the first time we attempt to launch it, it is almost certainly
going to fail every time we try and launch it.

The current strategy also means that 3rd party services could not themselves
depend on the Zope service - the Zope service will report successful startup
before it really has (and therefore the dependent service may itself fail).
This isn't a known requirement today, but who knows!  "net start" and other
front ends also fail to detect fatal errors - they all say Zope started OK.

I propose:
We insist the child process can be created and continues to run for x
seconds (where x~5).  If that fails, we report an error (never reporting to
Windows that we started successfully).  If the child process stays alive for
this period, we report success to Windows, and then use the existing backoff
strategy should it die.  If the machine is heavily loaded, this 5 seconds
may expire before the fatal error is hit in the child - in that case, we are
simply doing what we do now - using the backoff strategy to hopelessly
attempt a restart - ie, a win in most cases, and no loss in the others.

3) environment setting
The service process should set a number of environment variables before
spawning the child - PYTHONPATH at a minimum, and according to issue #1408,
INSTANCE_HOME.  It already knows these values thanks to mkzopeinstance.  I'm
yet to determine where these values comes from for in binary build, but I
see no reason not to fix this (and possibly remove whatever magic the binary
does)

I propose:
A few trivial os.environ insertions based on the substitutions done by
mkzopeinstance, before we create the child process(es).  Alternatively, we
create an explicit new environment we pass to CreateProcess, but I see no
good reason for that.)

4) Currently, when the process is stopped, we immediately terminate the
child process.  This seems dangerous.  We should find a way to gracefully
terminate the child, and try that before we simply kill it.

I propose:
That someone help me work out how to do this <wink>.  I've already worked
out how if the service knows the username/password of a Zope administrator,
but it doesn't!  Sending a Ctrl_C 'signal' doesn't work without hacks to
run.py (and I'm yet to confirm it will even with such hacks).

I welcome any feedback on these issues.  Obviously I am willing to back each
of these proposals up (except 4!) with code that seems to work :)  I would
also welcome feedback on the best way to proceed (ie, create a new collector
for each issue?  thrash it out here?  give up?<wink>, etc)

Note that none of these issues would require a win32all/pywin32 update.  If
anyone was really upset by issue 1423 ("Zope 2.7.1 won't run as service
under NT"), and also able to test, I'd be willing to fix it - but that
*would* require a pywin32 upgrade.  Tim has already kindly filled me in on
that background, so it may not be trivial (ie, I would need help!)

Thanks!

Mark



More information about the Zope-Dev mailing list