[Zope] XRON and simultaneous triggers
Loren Stafford
lstafford@morphics.com
Thu, 21 Jun 2001 16:42:46 -0700
Please respond to the list so others can benefit from the exchange.
When you need to disable Xron, all you have to do is rename its __init__.py
file to something else, then restart Zope. That way the product won't be
loaded by Zope, but you don't have to delete the entire product directory.
-- Loren
> -----Original Message-----
> From: Capesius, Alan [mailto:CapesiusA@Sysmex.com]
> Sent: Thursday, June 21, 2001 16:07
> To: Loren Stafford; zope@zope.org
> Subject: RE: [Zope] XRON and simultaneous triggers
>
>
> A bit more info on my crash. (NTS 4.0 SP6 Zope2.30)
> At the time the crash occurred, an out of virtual memory was also
> displayed
> at the server and the server had memory usage above 1GB RAM. The active
> python task was not consuming this (nor was anything else, but the service
> had been restarted at that point. Global RAM usage?
>
> I've also disabled the error dump messages so that Zope will
> restart itself
> without human intervention when it bombs. Though, in this case it
> was still
> dumping a crash dump file, which takes awhile. I've now disabled crash
> dumps.
>
> Since I was unable to access the server to abort or modify any methods, I
> was forced to restart it. This did not resolve the xron problem, as the
> scheduler activated immediately and started looping again. I was able to
> resolve this by downing the service, removing the xron folder, restarting,
> removing the offending ODBC/SQL call in the xron method that I
> suspected and
> restarting again with the XRon folder back in place.
>
> The offending call was a single record update to an Access DB. I am now in
> the process of migrating all live query functions over to SQL Server based
> tables. I'm not sure if the jet drivers in the O/S were bogging down. My
> suspicion (without much evidence) is that the Jet connection was flipping
> open and closed so rapidly that something got out of whack with
> the timing.
> (but that theory's more about what I don't know that what I do know :)
>
> Thanks again Loren for the excellent insights into xron. Now i
> just have to
> learn ZClasses, python, and the meaning of life....
>
>
>
> >>>-----Original Message-----
> >>>From: Loren Stafford [mailto:lstafford@morphics.com]
> >>>Sent: Thursday, June 21, 2001 5:55 PM
> >>>To: Capesius, Alan; zope@zope.org
> >>>Subject: RE: [Zope] XRON and simultaneous triggers
> >>>
> >>>
> >>>The Xron Dispatcher executes ready events serially. This is
> >>>due to the fact
> >>>that Dispatcher consists of a single thread and fires
> >>>scheduled methods with
> >>>Client.py, so it has to wait until it gets a response from
> >>>Client.py before
> >>>it can service the next ready event.
> >>>
> >>>So I wouldn't expect to find that Xron contributes to
> >>>concurrency problems
> >>>by itself. Of course it might trigger a method that accesses
> >>>your database
> >>>at the same time that a real user is accessing the database,
> >>>but that should
> >>>be a trivial problem.
> >>>
> >>>The Dispatcher tries to avoid endless loops when something
> >>>goes wrong in a
> >>>scheduled method. First it tries to disarm the scheduled
> >>>method; failing
> >>>that, it deletes the corresponding entry from the Schedule catalog.
> >>>
> >>>It disarms the scheduled method by calling (again via
> >>>Client.py) disarm(), a
> >>>private method of the Xron DTML Method, and rescheduling the
> >>>method for
> >>>"never". Its attempt to disarm may fail, and the entries you
> >>>are seeing in
> >>>the log file show that it is failing. There are only a few
> >>>conditions that
> >>>can cause a failure at this point: insufficient priviledges
> >>>to execute the
> >>>disarm() method, incorrect URL of the method, failure of the
> >>>socket used by
> >>>Client.py, failure to update the Schedule catalog when
> >>>disarming the method.
> >>>
> >>>In case the Dispatcher fails to disarm a Xron DTML Method,
> >>>it still tries to
> >>>prevent looping by deleting the corresponding entry from the Schedule
> >>>catalog. I think the only failure that can occur at this
> >>>point is a Schedule
> >>>catalog update failure. If it can't change or delete the
> >>>schedule entry from
> >>>the catalog, it will find it there again the next time it
> >>>wakes up (which
> >>>will be immediately) and try to trigger it again.
> >>>
> >>>I don't know why it would not be possible to update the
> >>>Schedule at this
> >>>point. But I have twice seen such a failure on my system (NT
> >>>4, Zope 2.3.2).
> >>>In fact, just yesterday I received 163 email messages
> >>>telling me to take out
> >>>the garbage. 8-0
> >>>
> >>>If anyone has any ideas about how the Dispatcher could fail
> >>>to disarm a
> >>>failed event, I'd be happy to listen. In fact, someone who
> >>>is familiar with
> >>>ZODB programming should look over the ZODB and transaction logic in
> >>>Dispatcher.py and check whether I am doing something wrong
> >>>there. I really
> >>>don't understand everything I wrote. 8-)
> >>>
> >>>In fact, why don't I just include it below.
> >>>
> >>>-- Loren
> >>>
> >>>#####################################################################
> >>>""" Dispatcher for Xron Events """
> >>>
> >>>#The Dispatcher runs as a separate thread
> >>># It uses the Schedule as its primary data structure
> >>># It knows which event is next
> >>># It sleeps until that event or a change in the Schedule
> >>># It fires an event and logs its output
> >>>
> >>>import Loggerr
> >>>loggerr=Loggerr.loggerr
> >>>#import pdb; pdb.set_trace()
> >>>
> >>>from Globals import DateTime
> >>>import sys, string
> >>>
> >>>maxwait=float(10) # max time to wait between wake-ups in seconds
> >>>
> >>>def Timer(ScheduleID, ScheduleChange, rpc): # aka Dispatcher
> >>> #loggerr(0, 'Dispatcher thread started.')
> >>> # infinite loop
> >>> while 1:
> >>> # Good morning. We just woke up.
> >>> # The first thing we need is a new connection.
> >>> import Zope
> >>> app=Zope.app()
> >>> try:
> >>> Schedule = getattr(app, ScheduleID, None)
> >>> except:
> >>> loggerr(301, 'Cannot access catalog. Suspending operation.')
> >>> break
> >>>
> >>> interval=maxwait # Default sleep time. May be recalculated below
> >>> try:
> >>> (atime, aurl)=Schedule.armed_event() # Get next armed event
> >>> except:
> >>> loggerr(302,'Cannot access catalog. Suspending operation.')
> >>> break # out of infinite loop
> >>> if atime is None:
> >>> #loggerr(0, 'No armed events.') # debug
> >>> pass # Sleep some more
> >>> elif atime.isFuture(): # The next armed event is not yet ready
> >>> # calculate how long we have to wait
> >>> ainterval=atime.timeTime() - DateTime().timeTime()
> >>> if ainterval < float(0): ainterval=float(0) # Is negative bad?
> >>> interval=ainterval # Comment out for debugging to
> >>>limit to maxwait
> >>> else: # This event is ready now.
> >>> # Fire event, and log its output
> >>> emsg= '\nTrigger event: %s\nTrigger time: %s' % (aurl, atime)
> >>> furl=string.join((aurl, 'trigger'), '/')
> >>> try:
> >>> (headers,response)=rpc(furl) # Fire event
> >>> dmsg='%s\n' % response
> >>> loggerr(0, emsg, detail=dmsg) # Log the event and its output
> >>> except:
> >>> type, val, tb = sys.exc_info()
> >>> dmsg="Failed to trigger event.\nType=%s\nVal=%s\n" %
> >>>(type, val)
> >>> loggerr(100, emsg, detail=dmsg)
> >>> del type, val, tb, dmsg
> >>> try:
> >>> rpc('%s/%s' % (aurl, 'disarm')) # Attempt to disarm
> >>> loggerr(100, 'Disarmed event', detail='')
> >>> except:
> >>> # aurl is probably pointing to an event that no
> >>>longer exists
> >>> # or the url doesn't resolve correctly
> >>> loggerr(100, "Failed to disarm event", detail='')
> >>> # Let's just kick it out of the catalog
> >>> # Otherwise, this event will come back to haunt us
> >>> try:
> >>> Schedule.exterminate(aurl)
> >>> get_transaction().commit()
> >>> except:
> >>> pass
> >>> # Finished processing one event
> >>> app._p_jar.sync() # see ZODB/Connection.py
> >>> interval=float(0)
> >>>
> >>> # We're going to sleep now; so, free the connection
> >>> app._p_jar.close()
> >>> del app
> >>> # Sleep for predetermined interval
> >>> #emsg= 'Going to sleep for %s seconds' % (interval)
> >>> ScheduleChange.wait(interval) # in seconds (float)
> >>>
> >>> if ScheduleChange.isSet():
> >>> #loggerr(0, 'Awakened by set event.') # debug
> >>> ScheduleChange.clear()
> >>> # Schedule has changed, we woke up early
> >>> # Loop back to the top and check for
> >>> # an earlier event than we were waiting for
> >>> #else:
> >>> #loggerr(0, 'Awakened by timeout.') # debug
> >>> # We timed out, so there must be a ready event
> >>> # Loop back to the top and trigger the next ready event.
> >>> # That's probably the one we were waiting for.
> >>> #pass
> >>>
> >>> # End of while 1 loop.
> >>> # Something bad happened, let's clean up before quitting for good
> >>> try:
> >>> app._p_jar.close()
> >>> del app
> >>> except:
> >>> pass
> >>> loggerr(100, 'Dispatcher thread is terminating.')
> >>>
> >>>============== end ================
> >>>
>