[Zope-CMF] cmf.zope.org is OFFLINE

Tres Seaver tseaver@zope.com
06 Sep 2002 12:20:05 -0400


On Fri, 2002-09-06 at 11:02, Paul Winkler wrote:
> On Fri, Sep 06, 2002 at 07:41:11AM -0500, Jeffrey_Franks@i-o.com wrote:
> > 
> > 
> > 
> > How do you setup monitoring? This might be useful info for some of us.
> 
> Yeah, i'd be interested in other people's strategies...
> 
> look up daemontools if you want a way to automatically restart
> a service that's died.
> 
> if you just want to see if a remote server is responding, 
> a simple shell script can be fine - e.g. use wget to grab a simple test
> page; if it times out (probably use the -T option), 
> send yourself email. 
> Run that script every so often with cron, and there you go. 

A quick thumbnail of the monitoring within the zope.org cluster:

  - We have to monitor a minimum of three servers for each distinct
    "Zope instance":  two app servers (ZEO clients) plus the storage
    server.

  - In addition, there are "infrastructure" processes (Apache) and
    servers (the load balancer).

  - The monitoring process uses a simple HTTP GET of a dummy page
    to test that the appserver processes are up and not hung.

  - The initial problem with the cmf.zope.org site was the result of a
    sysadmin mistake:  the admin working on packing the www.zope.org
    storage (which is up over 9Gb at the moment) accidentally killed the
    CMF storage server.  For most of yesterday, the appservers were
    running purely of their ZEO caches.  Finally, they got hosed enough
    that they all died.

  - Our normal monitoring for the dogbowl had been switched off due to a
    ton of what seemed like "false positives" during the packing
    attempt, and didn't get re-enabled until y'all pointed out that the
    site was down.

Tres.
-- 
===============================================================
Tres Seaver                                tseaver@zope.com
Zope Corporation      "Zope Dealers"       http://www.zope.com