How bad _are_ ConflictErrors
Hi All, We all know that ideally we should have no ConflictErrors happening in our apps, but of course, that's often not the case ;-) Firstup, some questions about what gets logged for ConflictErrors, here's a line from one of our event logs: 2005-11-17T08:00:27 INFO(0) ZODB conflict error at /some_uri (347 conflicts since startup at 2005-11-08T17:56:20) What is this telling me? Did the user actually see a ConflictError page? Or was this error successfully resolved? What object did this ConflictError occur on and/or how can I modify my our Zope instances to find out where the conflict was occurring? Now, when should the number of ConflictErrors logged in this way start to become worrying? I analysed the logs from our cluster and we're getting about 450 conflict errors in our busiest hours when the cluster of 8 ZEO clients is taking about 11,000 hits in that hour. Is this 'bad'? If so, where should I start to make things better? All feedback greatfully received, especially if people have been in similar situations... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Conflicts and how they interact with the database and sessioning machinery is my hot button right at the moment )-: I Hope I have not included too much information. I ran a quick report and we see about 1000 conflicts per hour at about 120000 hits per hour. These are order of magnitude numbers and are highly variable. The 1% number is way bigger than I am comfortable with although I have no basis to scale my expectations. I'd be much happier were it a couple of orders of magnitude smaller. Conflict errors are not always errors. As I understand it, Zope retries when a conflict occurs and usually is able to commit both sides of the conflicting transaction. Sometimes Zope cannot commit conflicting transactions--and it is at that point that an error occurs. There are supposed to be significant changes in the Zope 2.8.4/ZODB 3.4.2 system. Read-read conflicts no longer generate conflict errors and the retry mechanism has been reworked at the ZODB level to retry once and then raise a POSKEY exception. The optimistic locking used by Zope can cause problems, particularly when the conflicting method changes external state. We have seen instances where an action was taken multiple times due to conflicts and their resolution. In one instance, we had an infinite loop in the conflict resolution. The interactions which can cause conflicts are not always obvious. I am still learning. We do have occasional instances where unresolved conflicts raise user visible diagnostics. These are real errors. While I have not explored the reasons why, it appears that at least some of these errors are not logged in event.log but only displayed to the user. I asked the list the other day whether anyone had prepared a set of best practice guidelines on the techniques to use to minimize conflicts? Dieter Maurer responded:
* Localize out into separate persistent objects attributes with high write frequency.
E.g. when you have a counter, put into its own persistent object (you can use a "BTrees.Length.Length" object for a counter).
* Implement "conflict resolution" for your high frequently written persistent objects.
Formerly, "TemporaryStorage" had only very limited history information to support conflict resolution (which limited the wholesome effect of conflict resolution). Rumours say that this improved with Zope 2.8.
* Write only when you really change something.
E.g. instead of "session[XXX] = sss" use "if session[XXX] != sss: session[XXX] = sss" (at least, if there is a high chance that "session" already contains the correct value).
Session variable present a particularly vexing problem since they may trigger writes even though they are apparently read-only. Chris McDonough <chrism@plope.com> wrote in response to my posting:
On Nov 20, 2005, at 12:16 PM, Dennis Allison wrote:
[...]
Looking at the code, I don't understand why I am seeing conflicts. As I understand things, neither variables in the <dtml-let> space nor the REQUEST/RESPONSE space are stored in the ZODB so modifications to them don't look like writes to the conflict mechanism. Am I incorrect in my understanding?
Yes, but that's understandable. It's not exactly obvious.
The sessioning machinery is one of the few places in Zope where it's necessary for the code to do what's known as a "write on read" in the ZODB database.
Even if you're just "reading" from a session, looking up a session, or doing anything otherwise related to sessioning, it's possible for your code to generate a ZODB write. This is why you get conflicts even if you're "just reading"; whenever you access the sessioning machinery, you are potentially (but not always) causing a ZODB write. All writes can potentially cause a conflict error.
While this might sound fantastic, it's pretty much impossible to avoid when using ZODB as a sessioning backend. The sessioning machinery has been tuned to generate as few conflicts as possible, and you can help it by doing your own timeout, resolution, and housekeeping tuning as has been suggested. MVCC gets rid of read conflicts. But it's not possible to completely avoid write conflicts under the current design.
Here's why. The sessioning machinery is composed of three major data structures:
- an index of "timeslice" to "bucket". A timeslice is an integer representing some range of time (the range of time is variable, depending on the "resolution", but out of the box, it represents 20 seconds). This mapping is an IOBTree.
- A "bucket" is a mapping from a browser id to "session data object" (aka transient object). This mapping is an OOBTree.
- three "increasers" which mark the "last" timeslice in which something was done (called the garbage collector, called the finalizer, etc).
The point of sessioning is to provide a writable namespace assigned to a single user that expires after some period of inactivity by that user. To this end, we need to keep track of when the last time the user "accessed" the session was. This is the point of the index.
When a user accesses his session, we may need to move his session data object (identified by his browser id) from one bucket (representing an older timeslice) to another (representing a newer timeslice). This needs to happen *even if your code doesn't write anything to his session*, because it represents a session access, and the session is defined by total inactivity (not just write inactivity). Likewise, when a user runs code that requires access to a session, but that user does not yet have a session data object, a write may need to occur. So seemingly innocuous accesses to session data can cause a write. Consider, in a Python script:
req = context.REQUEST REQUEST.SESSION
Looks pretty harmless and unlikely to cause a write. However, that's not true. If the "bucket" in which the user's session data object is found is not associated with the "current" timeslice, we need to move his data object to the bucket that *is* associated with the current timeslice, which is a write operation in order to make note of the fact that his session is now "current".
Likewise with:
req = context.REQUEST a = REQUEST.SESSION.get('foo')
Even though this appears to be "only a read", the sessioning machinery itself may need to perform a write operation to move the user's data object to the current bucket.
Jacking up the resolution time increases the period of time represented by a single timeslice, so fewer total writes need to be performed to keep a session "current". Turning on "external housekeeping" doesn't prevent this normal movement of data objects between buckets, it just causes another process that cleans up "stale" data from happening during normal sessioning operations.
The sessioning machinery attempts to minimize conflicts. The 2.8 version of the temporarystorage does MVCC, which essentially eliminates read conflict errors. The transience machinery includes significantly complicated logic to attempt to prevent conflict errors from occurring including code that attempts to prevent two threads from doing housekeeping at once as well as application level conflict resolution for simultaneous writes to the same session data object. However, the machinery uses BTrees to hold indexes. BTrees also have a limited number of conflict avoidance strategies, but under certain circumstances (a "bucket split" is the canonical case) it cannot be avoided so not all write conflicts can be prevented without using a different kind of data structure to hold sessioning data.
A more detailed description of how "transience" works is available within the file named "HowTransienceWorks.txt" in the Products/ Transience package within Zope in case you're interested.
I hope this explains why you see conflict errors even if your code "doesn't do any writes", because actually it probably does by virtue of accessing a session. Tuning the knobs that come with the machinery helps. Causing transactions to be as short as possible also helps (by not using ZEO to back the sessioning database or by making your code just generally faster) because then there is less of a chance of a conflicting change.
On Mon, 21 Nov 2005, Chris Withers wrote:
Hi All,
We all know that ideally we should have no ConflictErrors happening in our apps, but of course, that's often not the case ;-)
Firstup, some questions about what gets logged for ConflictErrors, here's a line from one of our event logs:
2005-11-17T08:00:27 INFO(0) ZODB conflict error at /some_uri (347 conflicts since startup at 2005-11-08T17:56:20)
What is this telling me? Did the user actually see a ConflictError page? Or was this error successfully resolved? What object did this ConflictError occur on and/or how can I modify my our Zope instances to find out where the conflict was occurring?
Now, when should the number of ConflictErrors logged in this way start to become worrying?
I analysed the logs from our cluster and we're getting about 450 conflict errors in our busiest hours when the cluster of 8 ZEO clients is taking about 11,000 hits in that hour.
Is this 'bad'? If so, where should I start to make things better?
All feedback greatfully received, especially if people have been in similar situations...
cheers,
Chris
--
[Dennis Allison]
... Conflict errors are not always errors.
At the ZODB level, an unresolved conflict always raises an exception. Whether such an exception is considered to be "an error" isn't ZODB's decision -- that's up to the app. My understanding (which may be wrong) is that Zope tries up to 3 times to perform & commit a given transaction, suppressing any conflict exceptions for the duration, before giving up.
As I understand it, Zope retries when a conflict occurs and usually is able to commit both sides of the conflicting transaction.
Right (although note that there may be more than two sides).
Sometimes Zope cannot commit conflicting transactions--and it is at that point that an error occurs.
Right, Zope eventually gives up on a transaction that keeps on raising conflict exceptions.
There are supposed to be significant changes in the Zope 2.8.4/ZODB 3.4.2 system.
There are. ZODB 3.3 introduced "multiversion concurrency control" (MVCC), which eliminates read conflicts in normal operation.
Read-read conflicts no longer generate conflict errors
Not really: under MVCC, there simply aren't any read conflicts. There may still be write conflicts.
and the retry mechanism has been reworked at the ZODB level to retry once and then raise a POSKEY exception.
Nope, no version of ZODB ever retries a transaction on its own. If an application (like Zope) wants to retry, it's entirely up to it do so.
The optimistic locking used by Zope
ZODB's transactional approach is optimistic, precisely because it _doesn't_ lock objects modified by a transaction. Any number of transactions are free to modify the same object at the same time -- no locking mechanism attempts to stop that. If multiple transactions do modify the same object at the same time, and that object doesn't implement conflict resolution, then only the first transaction to commit its changes to that object can succeed.
can cause problems, particularly when the conflicting method changes external state.
Yes -- but do note it's not a transactional system then (ZODB can roll back all changes _it_ makes, so that a failure to commit does no harm to the database state; external resources that can't "take back" provisional changes are indeed challenging to use in a transactional system).
On Nov 21, 2005, at 2:10 PM, Dennis Allison wrote:
Conflicts and how they interact with the database and sessioning machinery is my hot button right at the moment )-: I Hope I have not included too much information.
I ran a quick report and we see about 1000 conflicts per hour at about 120000 hits per hour.
Is this the number of log messages that indicate a conflict error occurred (e.g. "x conflict errors since DATE" messages in the event log) or the number of conflict errors that are retried more than three times and thus make it "out" to the app user? I'm guessing the former.
These are order of magnitude numbers and are highly variable. The 1% number is way bigger than I am comfortable with although I have no basis to scale my expectations. I'd be much happier were it a couple of orders of magnitude smaller.
I would be too. It's considerably difficult when ZODB is used as the sessioning backend. A lot of effort has been put in to reducing the potential for conflicts already. It could of course be better if more time was put in, but there hasn't been any reason (besides a sense of accomplishment and contribution to the greater good, anyway ;-) to put in that effort since the last time this machinery was overhauled. That said, if no conflict errors actually bubble up to the user using the application, the penalty is "just" app performance and "knowledge expense" (e.g. you can't use a nontransactional mailhost, you can't use a nontransactional database table, etc). You've already paid for the latter the hard way. ;-) I can't judge the expense of the former to you but I assume that's what you're primarily worried about now.
Conflict errors are not always errors.
The real reason they're called "errors" is only because they're implemented as Python exceptions. They are implemented as exceptions because it was the easiest mechanism to use (exceptions are already built into Python).
As I understand it, Zope retries when a conflict occurs and usually is able to commit both sides of the conflicting transaction.
There can be more than two sides (actually there always are... there are three.. the two conflicting in-progress connection states and the database state).
Sometimes Zope cannot commit conflicting transactions--and it is at that point that an error occurs.
An exception occurs, yes. Oops, I just realized Tim responded to the rest of these points, so I won't go on.
We do have occasional instances where unresolved conflicts raise user visible diagnostics. These are real errors. While I have not explored the reasons why, it appears that at least some of these errors are not logged in event.log but only displayed to the user.
To be pedantic, if you're right about conflict error tracebacks being shown to end users, it's not because they are "unresolved" (in the sense that 'application-level conflict resolution' could have prevented them), it's because a request was issued that resulted in a conflict error, which was retried, and then that retried request raised a conflict error, and then twice more. The only way to figure out what's going on here is to see the traceback. IIRC, Zope logs conflict error tracebacks at the BLATHER log level (as well as a deluge of other ancillary info). However, even if BLATHER logging mode is not on, if no obvious error is put in the event log when a conflict error is relayed to a user, that's definitely a bug. I'd believe it in a second! ;-) The Zope conflict exception catching code is written in such a complicated way (and without the benefit of any automated tests) that tracking that down could take an entire day which I don't have to burn ATM. So I'm afraid the status quo will prevail until someone gets so indignant about it that they either pay for it to be fixed or fix it themselves. Apologies for that. :-( - C
These are order of magnitude numbers and are highly variable. The 1% number is way bigger than I am comfortable with although I have no basis to scale my expectations. I'd be much happier were it a couple of orders of magnitude smaller.
I would be too. It's considerably difficult when ZODB is used as the sessioning backend. A lot of effort has been put in to reducing the potential for conflicts already. It could of course be better if more time was put in, but there hasn't been any reason (besides a sense of accomplishment and contribution to the greater good, anyway ;-) to put in that effort since the last time this machinery was overhauled.
I should also say that without the benefit of knowing whether you've taken the advice of turning the "knobs" available to you that help reduce conflicts (bumping up timeout resolution, turning off inband housekeeping, using a local database rather than a ClientStorage- backed database for session data), that we enumerated in previous emails, it's hard to know whether doing any more work would be beneficial. - C
On Mon, 21 Nov 2005, Chris McDonough wrote:
On Nov 21, 2005, at 2:10 PM, Dennis Allison wrote:
Conflicts and how they interact with the database and sessioning machinery is my hot button right at the moment )-: I Hope I have not included too much information.
I ran a quick report and we see about 1000 conflicts per hour at about 120000 hits per hour.
Is this the number of log messages that indicate a conflict error occurred (e.g. "x conflict errors since DATE" messages in the event log) or the number of conflict errors that are retried more than three times and thus make it "out" to the app user? I'm guessing the former.
*** you are correct -- this is the easy hack on the event.log. It's much harder to know how many make it out to the user. We have an associated bug in the MySQL interface which generates threading errors, apparently triggered by a conflict error and the subsequent backout. These occur with most conflicts which involve the database--almost every conflict with our system structure.
These are order of magnitude numbers and are highly variable. The 1% number is way bigger than I am comfortable with although I have no basis to scale my expectations. I'd be much happier were it a couple of orders of magnitude smaller.
I would be too. It's considerably difficult when ZODB is used as the sessioning backend. A lot of effort has been put in to reducing the potential for conflicts already. It could of course be better if more time was put in, but there hasn't been any reason (besides a sense of accomplishment and contribution to the greater good, anyway ;-) to put in that effort since the last time this machinery was overhauled.
*** I've moved from a ZODB sessioning backend to local sessioning. There has not been a significant change, I think because the MySQL problem dominates at the moment.
That said, if no conflict errors actually bubble up to the user using the application, the penalty is "just" app performance and "knowledge expense" (e.g. you can't use a nontransactional mailhost, you can't use a nontransactional database table, etc). You've already paid for the latter the hard way. ;-) I can't judge the expense of the former to you but I assume that's what you're primarily worried about now.
*** Right now, we have major problems with our transactional database and locks. Once that gets resolved, we will address how to refactor to minimize the cost of transactions and ensure correctness in the presence of conflicts. Correctness is already pretty much guaranteed with our current systems structure.
Conflict errors are not always errors.
The real reason they're called "errors" is only because they're implemented as Python exceptions. They are implemented as exceptions because it was the easiest mechanism to use (exceptions are already built into Python).
As I understand it, Zope retries when a conflict occurs and usually is able to commit both sides of the conflicting transaction.
There can be more than two sides (actually there always are... there are three.. the two conflicting in-progress connection states and the database state).
Sometimes Zope cannot commit conflicting transactions--and it is at that point that an error occurs.
An exception occurs, yes.
Oops, I just realized Tim responded to the rest of these points, so I won't go on.
*** Yes, he did. THANKS TIM for your comments and help. (And you too Chris)
We do have occasional instances where unresolved conflicts raise user visible diagnostics. These are real errors. While I have not explored the reasons why, it appears that at least some of these errors are not logged in event.log but only displayed to the user.
To be pedantic, if you're right about conflict error tracebacks being shown to end users, it's not because they are "unresolved" (in the sense that 'application-level conflict resolution' could have prevented them), it's because a request was issued that resulted in a conflict error, which was retried, and then that retried request raised a conflict error, and then twice more. The only way to figure out what's going on here is to see the traceback. IIRC, Zope logs conflict error tracebacks at the BLATHER log level (as well as a deluge of other ancillary info).
However, even if BLATHER logging mode is not on, if no obvious error is put in the event log when a conflict error is relayed to a user, that's definitely a bug. I'd believe it in a second! ;-)
*** have done that but no helpful results as of yet.
The Zope conflict exception catching code is written in such a complicated way (and without the benefit of any automated tests) that tracking that down could take an entire day which I don't have to burn ATM. So I'm afraid the status quo will prevail until someone gets so indignant about it that they either pay for it to be fixed or fix it themselves. Apologies for that. :-(
*** it may come to that. :-) --
Dennis Allison wrote:
*** you are correct -- this is the easy hack on the event.log. It's much harder to know how many make it out to the user. We have an associated bug in the MySQL interface which generates threading errors, apparently triggered by a conflict error and the subsequent backout. These occur with most conflicts which involve the database--almost every conflict with our system structure.
I'm actually not sure what's logged when a Conflict Error makes it back to the users, offhand I don't see anything in my logs. Can someone confirm or infirm that fact? If nothing is logged, I'll add something at level ERROR. BTW does someone have a handy script to provoke conflict errors on a naked Zope? Florent -- Florent Guillaume, Nuxeo (Paris, France) Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Hello, Florent. you wrote 22 íîÿáðÿ 2005 ã., 2:13:16:
BTW does someone have a handy script to provoke conflict errors on a naked Zope? Using http://www.zope.org/Members/kvas/ZAK ( it need some patch for zope 2.8) ( you need to read some manual about ZAK )
in 1 browser window try the following code import time context._p_changed = 1 time.sleep(2) in second browser window context._p_changed = 1 and submit both windows. request to zope from first browser window catchs a ConflictError after 2 seconds. -- Best regards, VS
Florent Guillaume wrote at 2005-11-22 00:13 +0100:
... I'm actually not sure what's logged when a Conflict Error makes it back to the users, offhand I don't see anything in my logs. Can someone confirm or infirm that fact?
If nothing is logged, I'll add something at level ERROR.
I fear nothing is logged. If you are at it you should make the log entry for ConflictError much more informative by logging the "str" of the "ConflictError" instance. It will tell: the conflicting object, the type of the conflict and the participating versions.
BTW does someone have a handy script to provoke conflict errors on a naked Zope?
They are most easily reproduced with local transactions: conn1 = db.open(); conn1.setLocalTransaction(); r1 = conn1.root() conn2 = db.open(); conn2.setLocalTransaction(); r2 = conn2.root() r1['a'] = 1; r2['a'] = 2 conn1.getTransaction().commit() conn2.getTransaction().commit() Note that "setLocalTransaction" is deprecated in ZODB 3.4. You get similar effects with special "TransactionManager"s (look at how "setLocalTransaction" is implemented). If you only want to check your logging extension, you can just raise a "ConflictError". -- Dieter
On 11/22/05, Florent Guillaume <fg@nuxeo.com> wrote:
Dennis Allison wrote:
*** you are correct -- this is the easy hack on the event.log. It's much harder to know how many make it out to the user. We have an associated bug in the MySQL interface which generates threading errors, apparently triggered by a conflict error and the subsequent backout. These occur with most conflicts which involve the database--almost every conflict with our system structure.
I'm actually not sure what's logged when a Conflict Error makes it back to the users, offhand I don't see anything in my logs. Can someone confirm or infirm that fact?
If nothing is logged, I'll add something at level ERROR.
BTW does someone have a handy script to provoke conflict errors on a naked Zope?
There is a doctest that might be useful here. See the last half of testPublisher() in lib/python/ZPublisher/tests/testPublish.py which tests the behavior of the publisher in the face of one or more ConflictErrors.
Thanks, I'll take a look. On Wed, 23 Nov 2005, Michael Dunstan wrote:
On 11/22/05, Florent Guillaume <fg@nuxeo.com> wrote:
Dennis Allison wrote:
*** you are correct -- this is the easy hack on the event.log. It's much harder to know how many make it out to the user. We have an associated bug in the MySQL interface which generates threading errors, apparently triggered by a conflict error and the subsequent backout. These occur with most conflicts which involve the database--almost every conflict with our system structure.
I'm actually not sure what's logged when a Conflict Error makes it back to the users, offhand I don't see anything in my logs. Can someone confirm or infirm that fact?
If nothing is logged, I'll add something at level ERROR.
BTW does someone have a handy script to provoke conflict errors on a naked Zope?
There is a doctest that might be useful here. See the last half of testPublisher() in lib/python/ZPublisher/tests/testPublish.py which tests the behavior of the publisher in the face of one or more ConflictErrors.
--
Florent Guillaume wrote:
I'm actually not sure what's logged when a Conflict Error makes it back to the users, offhand I don't see anything in my logs. Can someone confirm or infirm that fact?
Well, I can agree based on empirical evidence. I have MailingLogger (sorry, I really feel the need to plug this product more!) on all the clusters where I have ConflictError problems, and it's never once mailed me a log entry containing one :-(
If nothing is logged, I'll add something at level ERROR.
Has this been done? If not, I volunteer to do it as part of my upcoming "make COnflictErrors suck less" work ;-)
BTW does someone have a handy script to provoke conflict errors on a naked Zope?
from ZODB.POSException import ConflictError raise ConflictError ;-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers wrote:
If nothing is logged, I'll add something at level ERROR.
Has this been done? If not, I volunteer to do it as part of my upcoming "make COnflictErrors suck less" work ;-)
I'm working on it at the moment.
BTW does someone have a handy script to provoke conflict errors on a naked Zope?
from ZODB.POSException import ConflictError raise ConflictError
Guess what, I know that. But I want a "real" ConflictError, to see what useful info about involved objects I can log. Florent -- Florent Guillaume, Nuxeo (Paris, France) Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
On Fri, Nov 25, 2005 at 11:22:32AM +0100, Florent Guillaume wrote: | Guess what, I know that. But I want a "real" ConflictError, to see what | useful info about involved objects I can log. I'm pretty sure there are ZODB tests that exercise the creation and handling of ConflictError. -- Sidnei da Silva Enfold Systems, LLC. http://enfoldsystems.com
Florent Guillaume wrote at 2005-11-25 11:22 +0100:
... Guess what, I know that. But I want a "real" ConflictError, to see what useful info about involved objects I can log.
The only thing you need to log is "str(conflict_error_value)". "ConflictError" does a good job to provide all available information via its "__str__". -- Dieter
Hi All, Chris McDonough wrote:
Is this the number of log messages that indicate a conflict error occurred (e.g. "x conflict errors since DATE" messages in the event log) or the number of conflict errors that are retried more than three times and thus make it "out" to the app user? I'm guessing the former.
I'm interested to see that this causes everyone the same level of confusion ;-) I'll reply to the other message about this though...
The real reason they're called "errors" is only because they're implemented as Python exceptions. They are implemented as exceptions because it was the easiest mechanism to use (exceptions are already built into Python).
Interestingly, you can raise things that don't subclass Exception in python. This was discussed before, and I firmly agree with, that zodb conflicts should _not_ sublcass exception. That way, there's less chance of them being caught by inexperienced programmers putting in try: except Exception: 's. How would we go about making this change?
The Zope conflict exception catching code is written in such a complicated way (and without the benefit of any automated tests) that tracking that down could take an entire day which I don't have to burn ATM. So I'm afraid the status quo will prevail until someone gets so indignant about it that they either pay for it to be fixed or fix it themselves. Apologies for that. :-(
Okay, I'll bite, I might well be able to stump up the paid time on this if you guys can help me justify it as a performance improvement :-) I hope I can count on a bit of moral and intellectual support if I manage to make this happen! cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers wrote at 2005-11-24 19:34 +0000:
... and I firmly agree with, that zodb conflicts should _not_ sublcass exception.
-1
That way, there's less chance of them being caught by inexperienced programmers putting in try: except Exception: 's.
Most people that use an unrestricted "try ... except ..." use "try: ... except: ..." and not "try: ... except Exception: ...", thus your proposal will not help much...
How would we go about making this change?
Do not do it at all. -- Dieter
Chris Withers wrote at 2005-11-21 16:33 +0000:
... here's a line from one of our event logs:
2005-11-17T08:00:27 INFO(0) ZODB conflict error at /some_uri (347 conflicts since startup at 2005-11-08T17:56:20)
What is this telling me?
It is incredibly stupid. The message above only tells you, that (at the given time) a request for "/some_uri" resulted in a "ConflictError" and that since startup (at the given time) 347 conflicts occured. Unfortunately, it does not tell you * what object caused the conflict * whether it is a read or a write conflict (read conflicts are very rare since MVCC introduction, but they may still happen) * for write conflicts: what versions of the object did particate A long time ago, I posted an extension making this additional information available (it is all present in the exception instance. Zope is just too stupid to read (and log) it).
Did the user actually see a ConflictError page?
Usually not.
Or was this error successfully resolved?
It may (or may not) later be resolved. This is still not clear when the message is generated.
What object did this ConflictError occur on and/or how can I modify my our Zope instances to find out where the conflict was occurring?
See above -- search the archive for the extension...
Now, when should the number of ConflictErrors logged in this way start to become worrying?
When you start to see lots on them (per time unit).
I analysed the logs from our cluster and we're getting about 450 conflict errors in our busiest hours when the cluster of 8 ZEO clients is taking about 11,000 hits in that hour.
Is this 'bad'?
I would not be happy: it is about 5 %. This gives quite some chance that your customers see failures caused by the conflicts (when 3 repetitions are not enough).
If so, where should I start to make things better?
You find out which objects cause the conflicts. You analyse what you can do to reduce concurrent writes to these objects (split into separate persistent subobjects) or whether you can provide conflict resolution. -- Dieter
Dieter Maurer wrote:
What is this telling me?
It is incredibly stupid.
So I'm gathering...
The message above only tells you, that (at the given time) a request for "/some_uri" resulted in a "ConflictError" and that since startup (at the given time) 347 conflicts occured.
Unfortunately, it does not tell you
* what object caused the conflict
* whether it is a read or a write conflict (read conflicts are very rare since MVCC introduction, but they may still happen)
* for write conflicts: what versions of the object did particate
...and most importantly, whether or not this conflict error was resolved! (ie: did the user see this as an error, or just poor performance?)
A long time ago, I posted an extension making this additional information available (it is all present in the exception instance. Zope is just too stupid to read (and log) it).
I would really appreciate it if you could dig this out and give me a direct url. If you do, I promise I will see it gets properly merged into the Zope core...
Did the user actually see a ConflictError page?
Usually not.
...which, as I suspect you agree with me, simple isn't a good enough answer ;-)
I analysed the logs from our cluster and we're getting about 450 conflict errors in our busiest hours when the cluster of 8 ZEO clients is taking about 11,000 hits in that hour.
Is this 'bad'?
I would not be happy: it is about 5 %.
Yeah, me neither, but it would be great to quantify all of this :-)
If so, where should I start to make things better?
You find out which objects cause the conflicts.
You analyse what you can do to reduce concurrent writes to these objects (split into separate persistent subobjects) or whether you can provide conflict resolution.
...and I think the best way to get cracking with this is to improve logging and tidy up the retrying code for ConflictError. I've volunteered to do that, and will get to it asap, but I'm gonna need as much help as possible, particularly in reviewing code. OK, I'm guessing I'll cut a branch when I start, I really want this to land ASAP. Where should I start from? Will there be more 2.8 releases or should I go for 2.9 and then merge to the trunk? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers wrote at 2005-11-24 19:42 +0000:
... I would really appreciate it if you could dig this out and give me a direct url. If you do, I promise I will see it gets properly merged into the Zope core...
You know that I *never* search for others (I hate searching). But I can show you the code, we currently use in our Zope version (which has the improved logging and some other fixes): if issubclass(t, ConflictError): # First, we need to close the current connection. We'll # do this by releasing the hold on it. There should be # some sane protocol for this, but for now we'll use # brute force: global conflict_errors conflict_errors = conflict_errors + 1 method_name = REQUEST.get('PATH_INFO', '') err = ('ZODB conflict error at %s: %s' '(%s conflicts since startup at %s)') LOG(err % (method_name, str(v), conflict_errors, startup_time), INFO, '') LOG('Conflict traceback', BLATHER, '', error=sys.exc_info()) raise ZPublisher.Retry(t, v, traceback) # DM 2005-06-28: work around "lost retry exception logs" problem # (reported by "Malcolm Cleaton" "mailto:malcolm@jamkit.com" # in the mailing list) #if t is ZPublisher.Retry: v.reraise() if t is ZPublisher.Retry: try: v.reraise() except: t, v, traceback = sys.exc_info() This code sits in "Zope2.App.startup.zpublisher_exception_hook". The improved logging is achieved by including "str(v)" ("v" contains the exception value) into the log message (it is that easy). The last part (marked with "DM 2005-06-28") fixes a problem that your "error_log/standard_error_message" will not see any "ConflictError". While it is correct that the "error_log/standard_error_message" should not see "ConflictError"s that are retried; it should see (and if configured correspondingly log) the "ConflictError"s that go out to the user. With this fix, you do not need special handling (e.g. logging) of "ConflictError"s that go out to the user. They can (and should) just be handled as any other exception.
Did the user actually see a ConflictError page?
Usually not.
...which, as I suspect you agree with me, simple isn't a good enough answer ;-)
Add the fix shown above -- and the really problematic "ConflictError"s (those seen by your customer) are handled the same way as other exceptions seen by your customer (a good thing!).
...
I would not be happy: it is about 5 %.
Yeah, me neither, but it would be great to quantify all of this :-)
But, you can't. It is very difficult to quantify statistics precisely... -- Dieter
participants (9)
-
Chris McDonough -
Chris Withers -
Dennis Allison -
Dieter Maurer -
Florent Guillaume -
Michael Dunstan -
Sidnei da Silva -
Tim Peters -
Victor Safronovich