[Zope] XRON and simultaneous triggers

Loren Stafford lstafford@morphics.com
Thu, 21 Jun 2001 15:54:43 -0700


The Xron Dispatcher executes ready events serially. This is due to the fact
that Dispatcher consists of a single thread and fires scheduled methods with
Client.py, so it has to wait until it gets a response from Client.py before
it can service the next ready event.

So I wouldn't expect to find that Xron contributes to concurrency problems
by itself. Of course it might trigger a method that accesses your database
at the same time that a real user is accessing the database, but that should
be a trivial problem.

The Dispatcher tries to avoid endless loops when something goes wrong in a
scheduled method. First it tries to disarm the scheduled method; failing
that, it deletes the corresponding entry from the Schedule catalog.

It disarms the scheduled method by calling (again via Client.py) disarm(), a
private method of the Xron DTML Method, and rescheduling the method for
"never". Its attempt to disarm may fail, and the entries you are seeing in
the log file show that it is failing. There are only a few conditions that
can cause a failure at this point: insufficient priviledges to execute the
disarm() method, incorrect URL of the method, failure of the socket used by
Client.py, failure to update the Schedule catalog when disarming the method.

In case the Dispatcher fails to disarm a Xron DTML Method, it still tries to
prevent looping by deleting the corresponding entry from the Schedule
catalog. I think the only failure that can occur at this point is a Schedule
catalog update failure. If it can't change or delete the schedule entry from
the catalog, it will find it there again the next time it wakes up (which
will be immediately) and try to trigger it again.

I don't know why it would not be possible to update the Schedule at this
point. But I have twice seen such a failure on my system (NT 4, Zope 2.3.2).
In fact, just yesterday I received 163 email messages telling me to take out
the garbage. 8-0

If anyone has any ideas about how the Dispatcher could fail to disarm a
failed event, I'd be happy to listen. In fact, someone who is familiar with
ZODB programming should look over the ZODB and transaction logic in
Dispatcher.py and check whether I am doing something wrong there. I really
don't understand everything I wrote. 8-)

In fact, why don't I just include it below.

-- Loren

#####################################################################
""" Dispatcher for Xron Events """

#The Dispatcher runs as a separate thread
# It uses the Schedule as its primary data structure
# It knows which event is next
# It sleeps until that event or a change in the Schedule
# It fires an event and logs its output

import Loggerr
loggerr=Loggerr.loggerr
#import pdb; pdb.set_trace()

from Globals import DateTime
import sys, string

maxwait=float(10)  # max time to wait between wake-ups in seconds

def Timer(ScheduleID, ScheduleChange, rpc):     # aka Dispatcher
  #loggerr(0, 'Dispatcher thread started.')
  # infinite loop
  while 1:
    # Good morning. We just woke up.
    # The first thing we need is a new connection.
    import Zope
    app=Zope.app()
    try:
      Schedule = getattr(app, ScheduleID, None)
    except:
      loggerr(301, 'Cannot access catalog. Suspending operation.')
      break

    interval=maxwait # Default sleep time. May be recalculated below
    try:
      (atime, aurl)=Schedule.armed_event() # Get next armed event
    except:
      loggerr(302,'Cannot access catalog. Suspending operation.')
      break # out of infinite loop
    if atime is None:
      #loggerr(0, 'No armed events.') # debug
      pass     # Sleep some more
    elif atime.isFuture(): # The next armed event is not yet ready
      #   calculate how long we have to wait
      ainterval=atime.timeTime() - DateTime().timeTime()
      if ainterval < float(0): ainterval=float(0) # Is negative bad?
      interval=ainterval # Comment out for debugging to limit to maxwait
    else:                        # This event is ready now.
      # Fire event, and log its output
      emsg= '\nTrigger event: %s\nTrigger time: %s' % (aurl, atime)
      furl=string.join((aurl, 'trigger'), '/')
      try:
        (headers,response)=rpc(furl) # Fire event
        dmsg='%s\n' % response
        loggerr(0, emsg, detail=dmsg) # Log the event and its output
      except:
        type, val, tb = sys.exc_info()
        dmsg="Failed to trigger event.\nType=%s\nVal=%s\n" % (type, val)
        loggerr(100, emsg, detail=dmsg)
        del type, val, tb, dmsg
        try:
          rpc('%s/%s' % (aurl, 'disarm')) # Attempt to disarm
          loggerr(100, 'Disarmed event', detail='')
        except:
          # aurl is probably pointing to an event that no longer exists
          # or the url doesn't resolve correctly
          loggerr(100, "Failed to disarm event", detail='')
          # Let's just kick it out of the catalog
          # Otherwise, this event will come back to haunt us
          try:
            Schedule.exterminate(aurl)
            get_transaction().commit()
          except:
            pass
      # Finished processing one event
      app._p_jar.sync() # see ZODB/Connection.py
      interval=float(0)

    # We're going to sleep now; so, free the connection
    app._p_jar.close()
    del app
    #  Sleep for predetermined interval
    #emsg= 'Going to sleep for %s seconds' % (interval)
    ScheduleChange.wait(interval) # in seconds (float)

    if ScheduleChange.isSet():
      #loggerr(0, 'Awakened by set event.') # debug
      ScheduleChange.clear()
      # Schedule has changed, we woke up early
      # Loop back to the top and check for
      # an earlier event than we were waiting for
    #else:
      #loggerr(0, 'Awakened by timeout.') # debug
      # We timed out, so there must be a ready event
      # Loop back to the top and trigger the next ready event.
      # That's probably the one we were waiting for.
      #pass

  # End of while 1 loop.
  # Something bad happened, let's clean up before quitting for good
  try:
    app._p_jar.close()
    del app
  except:
    pass
  loggerr(100, 'Dispatcher thread is terminating.')

============== end ================