[Zope] XRON and simultaneous triggers

Loren Stafford lstafford@morphics.com
Thu, 21 Jun 2001 16:42:46 -0700


Please respond to the list so others can benefit from the exchange.

When you need to disable Xron, all you have to do is rename its __init__.py
file to something else, then restart Zope. That way the product won't be
loaded by Zope, but you don't have to delete the entire product directory.

-- Loren

> -----Original Message-----
> From: Capesius, Alan [mailto:CapesiusA@Sysmex.com]
> Sent: Thursday, June 21, 2001 16:07
> To: Loren Stafford; zope@zope.org
> Subject: RE: [Zope] XRON and simultaneous triggers
>
>
> A bit more info on my crash. (NTS 4.0 SP6 Zope2.30)
> At the time the crash occurred, an out of virtual memory was also
> displayed
> at the server and the server had memory usage above 1GB RAM. The active
> python task was not consuming this (nor was anything else, but the service
> had been restarted at that point. Global RAM usage?
>
> I've also disabled the error dump messages so that Zope will
> restart itself
> without human intervention when it bombs. Though, in this case it
> was still
> dumping a crash dump file, which takes awhile. I've now disabled crash
> dumps.
>
> Since I was unable to access the server to abort or modify any methods, I
> was forced to restart it. This did not resolve the xron problem, as the
> scheduler activated immediately and started looping again. I was able to
> resolve this by downing the service, removing the xron folder, restarting,
> removing the offending ODBC/SQL call in the xron method that I
> suspected and
> restarting again with the XRon folder back in place.
>
> The offending call was a single record update to an Access DB. I am now in
> the process of migrating all live query functions over to SQL Server based
> tables. I'm not sure if the jet drivers in the O/S were bogging down. My
> suspicion (without much evidence) is that the Jet connection was flipping
> open and closed so rapidly that something got out of whack with
> the timing.
> (but that theory's more about what I don't know that what I do know :)
>
> Thanks again Loren for the excellent insights into xron. Now i
> just have to
> learn ZClasses, python, and the meaning of life....
>
>
>
> >>>-----Original Message-----
> >>>From: Loren Stafford [mailto:lstafford@morphics.com]
> >>>Sent: Thursday, June 21, 2001 5:55 PM
> >>>To: Capesius, Alan; zope@zope.org
> >>>Subject: RE: [Zope] XRON and simultaneous triggers
> >>>
> >>>
> >>>The Xron Dispatcher executes ready events serially. This is
> >>>due to the fact
> >>>that Dispatcher consists of a single thread and fires
> >>>scheduled methods with
> >>>Client.py, so it has to wait until it gets a response from
> >>>Client.py before
> >>>it can service the next ready event.
> >>>
> >>>So I wouldn't expect to find that Xron contributes to
> >>>concurrency problems
> >>>by itself. Of course it might trigger a method that accesses
> >>>your database
> >>>at the same time that a real user is accessing the database,
> >>>but that should
> >>>be a trivial problem.
> >>>
> >>>The Dispatcher tries to avoid endless loops when something
> >>>goes wrong in a
> >>>scheduled method. First it tries to disarm the scheduled
> >>>method; failing
> >>>that, it deletes the corresponding entry from the Schedule catalog.
> >>>
> >>>It disarms the scheduled method by calling (again via
> >>>Client.py) disarm(), a
> >>>private method of the Xron DTML Method, and rescheduling the
> >>>method for
> >>>"never". Its attempt to disarm may fail, and the entries you
> >>>are seeing in
> >>>the log file show that it is failing. There are only a few
> >>>conditions that
> >>>can cause a failure at this point: insufficient priviledges
> >>>to execute the
> >>>disarm() method, incorrect URL of the method, failure of the
> >>>socket used by
> >>>Client.py, failure to update the Schedule catalog when
> >>>disarming the method.
> >>>
> >>>In case the Dispatcher fails to disarm a Xron DTML Method,
> >>>it still tries to
> >>>prevent looping by deleting the corresponding entry from the Schedule
> >>>catalog. I think the only failure that can occur at this
> >>>point is a Schedule
> >>>catalog update failure. If it can't change or delete the
> >>>schedule entry from
> >>>the catalog, it will find it there again the next time it
> >>>wakes up (which
> >>>will be immediately) and try to trigger it again.
> >>>
> >>>I don't know why it would not be possible to update the
> >>>Schedule at this
> >>>point. But I have twice seen such a failure on my system (NT
> >>>4, Zope 2.3.2).
> >>>In fact, just yesterday I received 163 email messages
> >>>telling me to take out
> >>>the garbage. 8-0
> >>>
> >>>If anyone has any ideas about how the Dispatcher could fail
> >>>to disarm a
> >>>failed event, I'd be happy to listen. In fact, someone who
> >>>is familiar with
> >>>ZODB programming should look over the ZODB and transaction logic in
> >>>Dispatcher.py and check whether I am doing something wrong
> >>>there. I really
> >>>don't understand everything I wrote. 8-)
> >>>
> >>>In fact, why don't I just include it below.
> >>>
> >>>-- Loren
> >>>
> >>>#####################################################################
> >>>""" Dispatcher for Xron Events """
> >>>
> >>>#The Dispatcher runs as a separate thread
> >>># It uses the Schedule as its primary data structure
> >>># It knows which event is next
> >>># It sleeps until that event or a change in the Schedule
> >>># It fires an event and logs its output
> >>>
> >>>import Loggerr
> >>>loggerr=Loggerr.loggerr
> >>>#import pdb; pdb.set_trace()
> >>>
> >>>from Globals import DateTime
> >>>import sys, string
> >>>
> >>>maxwait=float(10)  # max time to wait between wake-ups in seconds
> >>>
> >>>def Timer(ScheduleID, ScheduleChange, rpc):     # aka Dispatcher
> >>>  #loggerr(0, 'Dispatcher thread started.')
> >>>  # infinite loop
> >>>  while 1:
> >>>    # Good morning. We just woke up.
> >>>    # The first thing we need is a new connection.
> >>>    import Zope
> >>>    app=Zope.app()
> >>>    try:
> >>>      Schedule = getattr(app, ScheduleID, None)
> >>>    except:
> >>>      loggerr(301, 'Cannot access catalog. Suspending operation.')
> >>>      break
> >>>
> >>>    interval=maxwait # Default sleep time. May be recalculated below
> >>>    try:
> >>>      (atime, aurl)=Schedule.armed_event() # Get next armed event
> >>>    except:
> >>>      loggerr(302,'Cannot access catalog. Suspending operation.')
> >>>      break # out of infinite loop
> >>>    if atime is None:
> >>>      #loggerr(0, 'No armed events.') # debug
> >>>      pass     # Sleep some more
> >>>    elif atime.isFuture(): # The next armed event is not yet ready
> >>>      #   calculate how long we have to wait
> >>>      ainterval=atime.timeTime() - DateTime().timeTime()
> >>>      if ainterval < float(0): ainterval=float(0) # Is negative bad?
> >>>      interval=ainterval # Comment out for debugging to
> >>>limit to maxwait
> >>>    else:                        # This event is ready now.
> >>>      # Fire event, and log its output
> >>>      emsg= '\nTrigger event: %s\nTrigger time: %s' % (aurl, atime)
> >>>      furl=string.join((aurl, 'trigger'), '/')
> >>>      try:
> >>>        (headers,response)=rpc(furl) # Fire event
> >>>        dmsg='%s\n' % response
> >>>        loggerr(0, emsg, detail=dmsg) # Log the event and its output
> >>>      except:
> >>>        type, val, tb = sys.exc_info()
> >>>        dmsg="Failed to trigger event.\nType=%s\nVal=%s\n" %
> >>>(type, val)
> >>>        loggerr(100, emsg, detail=dmsg)
> >>>        del type, val, tb, dmsg
> >>>        try:
> >>>          rpc('%s/%s' % (aurl, 'disarm')) # Attempt to disarm
> >>>          loggerr(100, 'Disarmed event', detail='')
> >>>        except:
> >>>          # aurl is probably pointing to an event that no
> >>>longer exists
> >>>          # or the url doesn't resolve correctly
> >>>          loggerr(100, "Failed to disarm event", detail='')
> >>>          # Let's just kick it out of the catalog
> >>>          # Otherwise, this event will come back to haunt us
> >>>          try:
> >>>            Schedule.exterminate(aurl)
> >>>            get_transaction().commit()
> >>>          except:
> >>>            pass
> >>>      # Finished processing one event
> >>>      app._p_jar.sync() # see ZODB/Connection.py
> >>>      interval=float(0)
> >>>
> >>>    # We're going to sleep now; so, free the connection
> >>>    app._p_jar.close()
> >>>    del app
> >>>    #  Sleep for predetermined interval
> >>>    #emsg= 'Going to sleep for %s seconds' % (interval)
> >>>    ScheduleChange.wait(interval) # in seconds (float)
> >>>
> >>>    if ScheduleChange.isSet():
> >>>      #loggerr(0, 'Awakened by set event.') # debug
> >>>      ScheduleChange.clear()
> >>>      # Schedule has changed, we woke up early
> >>>      # Loop back to the top and check for
> >>>      # an earlier event than we were waiting for
> >>>    #else:
> >>>      #loggerr(0, 'Awakened by timeout.') # debug
> >>>      # We timed out, so there must be a ready event
> >>>      # Loop back to the top and trigger the next ready event.
> >>>      # That's probably the one we were waiting for.
> >>>      #pass
> >>>
> >>>  # End of while 1 loop.
> >>>  # Something bad happened, let's clean up before quitting for good
> >>>  try:
> >>>    app._p_jar.close()
> >>>    del app
> >>>  except:
> >>>    pass
> >>>  loggerr(100, 'Dispatcher thread is terminating.')
> >>>
> >>>============== end ================
> >>>
>