I've been looking for a "standard" way of saying "hey ZEO, are you up?" that I can work into our failover system. (btw, our production zeo server is nearly read-only; all changes are made on dev and then we sync to production using ZSyncer. So I'm not worried about losing state when we fail over.) The simplest thing would be this, swiped from zctl.py: def _check_for_service(host, port): """Return 1 if server is found at (host, port), 0 otherwise.""" s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: s.connect( (host, int(port)) ) return 1 except socket.error: return 0 So we just make a raw socket connection to the ZEO port and say we succeeded if there's anything there. However, that doesn't seem very robust. Is it possible for ZEO to get into a state where it accepts socket connections but won't deliver data? I don't know, and I feel safer assuming "yes". As pointed out here in some thread or other months ago, we could always set up a script that makes a write transaction every so often, but this is Bad. the ZODB would grow and grow and grow with pointless undo data ... So here's what we came up with, hacked out of stuff found in the ZEO test scripts : we make a dummy ZEO client and connect it and try to pull the root object. def testzeo(): storage=ZEO.ClientStorage.ClientStorage((host, port), name='ZEO Heartbeat Test at %s:%s ' % (host, port), wait_for_server_on_startup=0, client=None, debug=1, cache_size=0) storage.registerDB(DummyDB(), None) storage.load('\0\0\0\0\0\0\0\0', '') storage._call.__haveMainLoop = 0 storage.notifyDisconnected = dummy storage.close() return 1 If that fails for any reason, ZEO is down. As far as we can tell, this *works*: if ZEO is shut down we get an error, if it's not, we get 1. Great. But this seems very non-kosher. I've never heard of anybody opening and closing ZEO connections every 5 seconds on a production site. And we've seen a couple of very weird errors just after ZEO starts - a page loads in Zope, following a link gives a weird error (like no doctstring on a DTML method), then all links after that are fine. I can't help but think that our test script is contributing to this sporadic, only-on-startup flakiness, but I have no real evidence for that. So back to the original question... Is there a Right Way to check if ZEO is really up and running? -- Paul Winkler http://www.slinkp.com "Welcome to Muppet Labs, where the future is made - today!"
At 02:56 PM 10/28/2002 -0800, you wrote:
And we've seen a couple of very weird errors just after ZEO starts - a page loads in Zope, following a link gives a weird error (like no doctstring on a DTML method), then all links after that are fine.
I doubt that's ZEO... I've seen that same thing, running standalone. Seems to happen after lulls in activity. So far as I can tell, this "error" isn't preventing the client from viewing the requested object, as I can see successful requests immediately thereafter. Subsequent requests to the exact same URL produce no trouble at all. Anyone know what's actually going on here? Dylan
On Mon, Oct 28, 2002 at 03:28:32PM -0800, Dylan Reinhardt wrote:
At 02:56 PM 10/28/2002 -0800, you wrote:
And we've seen a couple of very weird errors just after ZEO starts - a page loads in Zope, following a link gives a weird error (like no doctstring on a DTML method), then all links after that are fine.
I doubt that's ZEO... I've seen that same thing, running standalone.
Really? You've seen the inexplicable "missing docstring" error? I don't know whether that's reassuring or troubling :-/ -- Paul Winkler http://www.slinkp.com "Welcome to Muppet Labs, where the future is made - today!"
Am 29.10.2002, 16:34 Uhr schryb Paul Winkler <pw_lists@slinkp.com>:
Really? You've seen the inexplicable "missing docstring" error?
<aol>Me too, me too</aol> Though our setup is Zope 2.6 and matching ZEO running on the same machine. We also have POSKeyerrors raised inside "modifiedInVersion" that go away after some reloading of the same URL. These typically occur after calls to somefolder.manage_clone(someobject,newname). Jo. -- Internetmanufaktur Jo Meder ---------------------- Berlin, Germany http://www.meder.de/ ------------------- fon: ++49-30-417 17 63 33 Kollwitzstr. 75 ------------------------ fax: ++49-30-417 17 63 45 10435 Berlin --------------------------- mob: ++49-170- 2 98 89 97 Public GnuPG-Key ---------- http://www.meder.de/keys/jo-pubkey.txt
Jo Meder wrote:
<aol>Me too, me too</aol> Though our setup is Zope 2.6 and matching ZEO running on the same machine. We also have POSKeyerrors raised inside "modifiedInVersion" that go away after some reloading of the same URL. These typically occur after calls to somefolder.manage_clone(someobject,newname).
You may wish to try upgrading to Zope 2.6 (make sure you delete any Data.fs.index files lying around) and see if this problem goes away. cheers, Chris
Am 30.10.2002, 12:36 Uhr schröb Chris Withers <chrisw@nipltd.com>:
Though our setup is Zope 2.6 and matching ZEO You may wish to try upgrading to Zope 2.6 (make sure you delete any Data.fs.index files lying around) and see if this problem goes away.
Huh? Upgrade from 2.6 to 2.6? I'll try deleting the indexes though. Jo. -- Internetmanufaktur Jo Meder ---------------------- Berlin, Germany http://www.meder.de/ ------------------- fon: ++49-30-417 17 63 33 Kollwitzstr. 75 ------------------------ fax: ++49-30-417 17 63 45 10435 Berlin --------------------------- mob: ++49-170- 2 98 89 97 Public GnuPG-Key ---------- http://www.meder.de/keys/jo-pubkey.txt
Jo Meder wrote:
Am 30.10.2002, 12:36 Uhr schröb Chris Withers <chrisw@nipltd.com>:
Though our setup is Zope 2.6 and matching ZEO
You may wish to try upgrading to Zope 2.6 (make sure you delete any Data.fs.index files lying around) and see if this problem goes away.
Huh? Upgrade from 2.6 to 2.6? I'll try deleting the indexes though.
Ug :-( Didn't know you were on 2.6, sorry :-( Chris - the world is a bad place...
Paul Winkler wrote:
I've been looking for a "standard" way of saying "hey ZEO, are you up?" that I can work into our failover system.
Have you looked at ZEO2? I have vague memories of such a feature being integrated there... If not, go ask on zodb-dev@zope.org and see if it can be added as standard... cheers, Chris
Paul Winkler writes:
... probing ZEO ...
But this seems very non-kosher. I've never heard of anybody opening and closing ZEO connections every 5 seconds on a production site. You could reuse the old connection and just see whether ZEO is still answering requests.
We use "clientStorage._call('get_info')" for this. Dieter
On Tue, Oct 29, 2002 at 05:20:59PM +0100, Dieter Maurer wrote:
Paul Winkler writes:
... probing ZEO ...
But this seems very non-kosher. I've never heard of anybody opening and closing ZEO connections every 5 seconds on a production site. You could reuse the old connection and just see whether ZEO is still answering requests.
hmmm. i'll have to think about how to do that from mon, which wants to run a script every so often. maybe add another layer ... mon runs test_zeo.sh test_zeo.sh writes to a named pipe which triggers a test_zeo.py which is running all the time and which has a connection to zeo. test_zeo.py could be made to create the named pipe and open it for reading when it starts, and have an exit handler that cleans up the pipe. something like that... But anyway, it sounds like our current script is *not* the cause of the zeo oddness, since other people are getting it... --PW
We use "clientStorage._call('get_info')" for this.
Dieter
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
-- Paul Winkler http://www.slinkp.com "Welcome to Muppet Labs, where the future is made - today!"
participants (5)
-
Chris Withers -
Dieter Maurer -
Dylan Reinhardt -
Jo Meder -
Paul Winkler