I am thinking of a fairly simple change to zdaemon: if it finds that it is continuously respawning the program more than 10 times in 2 minutes, it assumes there is a fatal error, log a PANIC level message, and exit. (
(I took the criterion, but not the response from init(8).)
In this scenario init pauses for a few minutes, rather than aborting. I would like an option to prevent zdaemon aborting, and I am surpised you dont want it as the default.
I think init uses a simple fixed pause... an exponential backoff would probably be smarter (like how a disconnected ZEO ClientStorage tries to reconnect to its server)
I thought about this, and figured it wasn't necessary. Unlike init, zdaemon only manages one process. When that process doesn't get past its initialization, manual intervention is normally required to make it run again; that manual intervention can include restarting it. But I have to admit that the use case I've been thinking of is that of starting zeo and finding that it crashes immediately, over and over. There the auto-stop is just what you need (there's no point in filling up the log file while you're thinking about what could have caused this). There's a different use case where something changes in the environment after the program has run successfully for a while, which causes it to crash and causes subsequent restarts to crash immediately. It is *possible* that the environment fixes itself after a while -- it could be something like a network, DNS or NFS outage -- and then an auto-restart option might be nice. I'm not sure what should be the default -- as a developer, I prefer that it stops (and I hate that zdaemon is the default at all), but for a production site something different might be in order. --Guido van Rossum (home page: http://www.python.org/~guido/)