[Zope-dev] How bad _are_ ConflictErrors

Mon Nov 21 14:10:03 EST 2005

Conflicts and how they interact with the database and sessioning machinery 
is my hot button right at the moment )-:  I Hope I have not 
included too much information.

I ran a quick report and we see about 1000 conflicts per hour at 
about 120000 hits per hour.  These are order of magnitude numbers and are 
highly variable.  The 1% number is way bigger than I am comfortable with 
although I have no basis to scale my expectations.  I'd be much happier were 
it a couple of orders of magnitude smaller.

Conflict errors are not always errors.  As I understand it, Zope retries
when a conflict occurs and usually is able to commit both sides of the 
conflicting transaction.  Sometimes Zope cannot commit conflicting 
transactions--and it is at that point that an error occurs.   There are 
supposed to be significant changes in the Zope 2.8.4/ZODB 3.4.2 system.
Read-read conflicts no longer generate conflict errors and the retry 
mechanism has been reworked at the ZODB level to retry once and then raise 
a POSKEY exception.

The optimistic locking used by Zope can cause problems, particularly when
the conflicting method changes external state.  We have seen instances
where an action was taken multiple times due to conflicts and their
resolution.  In one instance, we had an infinite loop in the conflict
resolution.   The interactions which can cause conflicts are not always 
obvious.  I am still learning.

We do have occasional instances where unresolved conflicts raise user 
visible diagnostics.  These are real errors.  While I have not explored 
the reasons why, it appears that at least some of these errors are not
logged in event.log but only displayed to the user.

I asked the list the other day whether anyone had prepared a set of best
practice guidelines on the techniques to use to minimize conflicts?
Dieter Maurer responded:

>   *  Localize out into separate persistent objects attributes
>      with high write frequency.
> 
>      E.g. when you have a counter, put into its own
>      persistent object (you can use a "BTrees.Length.Length" object
>      for a counter).
> 
>   *  Implement "conflict resolution" for your high frequently
>      written persistent objects.
> 
>      Formerly, "TemporaryStorage" had only very limited
>      history information to support conflict resolution (which
>      limited the wholesome effect of conflict resolution).
>      Rumours say that this improved with Zope 2.8.
> 
>   *  Write only when you really change something.
> 
>      E.g. instead of "session[XXX] = sss" use
>      "if session[XXX] != sss: session[XXX] = sss"
>      (at least, if there is a high chance that "session" already
>      contains the correct value).

Session variable present a particularly vexing problem since they may 
trigger writes even though they are apparently read-only.   

Chris McDonough <chrism at plope.com> wrote in response to my posting:
> 
> On Nov 20, 2005, at 12:16 PM, Dennis Allison wrote:
[...]
> > Looking at the code, I don't understand why I am seeing conflicts.
> > As I understand things, neither variables in the <dtml-let> space nor
> > the REQUEST/RESPONSE space are stored in the ZODB so modifications to
> > them don't look like writes to the conflict mechanism.  Am I incorrect
> > in my understanding?
> 
> Yes, but that's understandable.  It's not exactly obvious.
> 
> The sessioning machinery is one of the few places in Zope where it's  
> necessary for the code to do what's known as a "write on read" in the  
> ZODB database.
> 
> Even if you're just "reading" from a session, looking up a session,  
> or doing anything otherwise related to sessioning, it's possible for  
> your code to generate a ZODB write.
> This is why you get conflicts even if you're "just reading"; whenever  
> you access the sessioning machinery, you are potentially (but not  
> always) causing a ZODB write.  All writes can potentially cause a  
> conflict error.
> 
> While this might sound fantastic, it's pretty much impossible to  
> avoid when using ZODB as a sessioning backend.  The sessioning  
> machinery has been tuned to generate as few conflicts as possible,  
> and you can help it by doing your own timeout, resolution, and  
> housekeeping tuning as has been suggested.  MVCC gets rid of read  
> conflicts.  But it's not possible to completely avoid write conflicts  
> under the current design.
> 
> Here's why.  The sessioning machinery is composed of three major data  
> structures:
> 
> - an index of "timeslice" to "bucket". A timeslice is an integer  
> representing
>    some range of time (the range of time is variable, depending on the
>    "resolution", but out of the box, it represents 20 seconds).    
> This mapping
>    is an IOBTree.
> 
> - A "bucket" is a mapping from a browser id to "session data  
> object" (aka
>    transient object).  This mapping is an OOBTree.
> 
> - three "increasers" which mark the "last" timeslice in which  
> something was done
>    (called the garbage collector, called the finalizer, etc).
> 
> The point of sessioning is to provide a writable namespace assigned  
> to a single user that expires after some period of inactivity by that  
> user.  To this end, we need to keep track of when the last time the  
> user "accessed" the session was.  This is the point of the index.
> 
> When a user accesses his session, we may need to move his session  
> data object (identified by his browser id) from one bucket  
> (representing an older timeslice) to another (representing a newer  
> timeslice).  This needs to happen *even if your code doesn't write  
> anything to his session*, because it represents a session access, and  
> the session is defined by total inactivity (not just write  
> inactivity).  Likewise, when a user runs code that requires access to  
> a session, but that user does not yet have a session data object, a  
> write may need to occur.  So seemingly innocuous accesses to session  
> data can cause a write.  Consider, in a Python script:
> 
> req = context.REQUEST
> REQUEST.SESSION
> 
> Looks pretty harmless and unlikely to cause a write.  However, that's  
> not true.  If the "bucket" in which the user's session data object is  
> found is not associated with the "current" timeslice, we need to move  
> his data object to the bucket that *is* associated with the current  
> timeslice, which is a write operation in order to make note of the  
> fact that his session is now "current".
> 
> Likewise with:
> 
> req = context.REQUEST
> a = REQUEST.SESSION.get('foo')
> 
> Even though this appears to be "only a read", the sessioning  
> machinery itself may need to perform a write operation to move the  
> user's data object to the current bucket.
> 
> Jacking up the resolution time increases the period of time  
> represented by a single timeslice, so fewer total writes need to be  
> performed to keep a session "current".   Turning on "external  
> housekeeping" doesn't prevent this normal movement of data objects  
> between buckets, it just causes another process that cleans up  
> "stale" data from happening during normal sessioning operations.
> 
> The sessioning machinery attempts to minimize conflicts.  The 2.8  
> version of the temporarystorage does MVCC, which essentially  
> eliminates read conflict errors.  The transience machinery includes  
> significantly complicated logic to attempt to prevent conflict errors  
> from occurring including code that attempts to prevent two threads  
> from doing housekeeping at once as well as application level conflict  
> resolution for simultaneous writes to the same session data object.   
> However, the machinery uses BTrees to hold indexes.  BTrees also have  
> a limited number of conflict avoidance strategies, but under certain  
> circumstances (a "bucket split" is the canonical case) it cannot be  
> avoided so not all write conflicts can be prevented without using a  
> different kind of data structure to hold sessioning data.
> 
> A more detailed description of how "transience" works is available  
> within the file named "HowTransienceWorks.txt" in the Products/ 
> Transience package within Zope in case you're interested.
> 
> I hope this explains why you see conflict errors even if your code  
> "doesn't do any writes", because actually it probably does by virtue  
> of accessing a session.  Tuning the knobs that come with the  
> machinery helps.  Causing transactions to be as short as possible  
> also helps (by not using ZEO to back the sessioning database or by  
> making your code just generally faster) because then there is less of  
> a chance of a conflicting change.
> 

On Mon, 21 Nov 2005, Chris Withers wrote:

> Hi All,
> 
> We all know that ideally we should have no ConflictErrors happening in 
> our apps, but of course, that's often not the case ;-)
> 
> Firstup, some questions about what gets logged for ConflictErrors, 
> here's a line from one of our event logs:
> 
> 2005-11-17T08:00:27 INFO(0) ZODB conflict error at /some_uri
> (347 conflicts since startup at 2005-11-08T17:56:20)
> 
> What is this telling me?
> Did the user actually see a ConflictError page?
> Or was this error successfully resolved?
> What object did this ConflictError occur on and/or how can I modify my 
> our Zope instances to find out where the conflict was occurring?
> 
> Now, when should the number of ConflictErrors logged in this way start 
> to become worrying?
> 
> I analysed the logs from our cluster and we're getting about 450 
> conflict errors in our busiest hours when the cluster of 8 ZEO clients 
> is taking about 11,000 hits in that hour.
> 
> Is this 'bad'? If so, where should I start to make things better?
> 
> All feedback greatfully received, especially if people have been in 
> similar situations...
> 
> cheers,
> 
> Chris
> 
> 

--