resolving conflict errors
Zope 2.7.6 I am a bit confused. I have a Zope DTML method that is generating ZODB conflict errors. The DTML method identified as producing the conflicts is a list of calls to other methods, conditionally executed. Most conflicts don't cause problems because the backoff and restart of the initial transaction will not have changed global state. In our particular case, the conflicting transaction has changed global state in our RDBMS so when it gets rerun, some RDBMS transactions are duplicated. And that's a problem. The solution, of course, is to resolve the conflicts properly. The first question: what data is generating the conflict? The DTML code and all the method references are static and unchanged. What data does Zope store in the ZODB when an object is evaluated? Presumably conflicts can be reolved programatically by setting a method on the object _p_resolveConflict( self, old, saved, new ) and returning one or another of the states (old, saved, new). It's not real clear how to do it. --
On Fri, 2005-10-14 at 09:27 -0700, Dennis Allison wrote:
Zope 2.7.6
I am a bit confused.
I have a Zope DTML method that is generating ZODB conflict errors.
The DTML method identified as producing the conflicts is a list of calls to other methods, conditionally executed.
Most conflicts don't cause problems because the backoff and restart of the initial transaction will not have changed global state. In our particular case, the conflicting transaction has changed global state in our RDBMS so when it gets rerun, some RDBMS transactions are duplicated. And that's a problem. The solution, of course, is to resolve the conflicts properly.
Another solution is to use a RDBMS that fully supports transactions. This is almost always the easiest solution. ConflictErrors are a fact of life if you use ZODB to store data; they are impossible to eliminate entirely so no matter what, if you need to communicate with external data stores that aren't transactional (MyISAM tables, LDAP, sendmail, etc.) you need to anticipate duplicated requests in the code that communicates with these systems. For example, Jens Vagelpohl has a replacement for Zope's MailHost that prevents retried requests due to conflict errors from causing a mail to be resent. I've written payment systems that anticipate the fact that the request may be retried, so instead of submitting the payment request twice, the code keeps around a little cache about what it did "last" so it doesn't do it again. And so on.
The first question: what data is generating the conflict?
I believe that if you run Zope event logging at BLATHER level, the traceback of every ConflictError exception is logged, which can give you an idea of what is causing the errors.
The DTML code and all the method references are static and unchanged. What data does Zope store in the ZODB when an object is evaluated?
None that you don't tell it to. Typically conflict errors are a result of two threads calling code which changes the same object at the same time, but nothing that Zope does "under the hood" causes it; it is always caused by application code. One "exception" to this rule is conflict errors raised when using Zope sessions. It's not actually an exception to the rule, but programmers are shielded from the fact that sessions store data in ZODB when you use the session API (e.g. REQUEST.SESSION). The sessioning machinery needs to manage housekeeping info whenever the API is used to expire old sessions and create new ones, so although it may not "look" like you are writing to the ZODB when you use sessions (even to read data out of them), you potentially are. Zope 2.8 has a ZODB that support multiversion concurrency control, which eliminates a certain class of conflict errors (read conflict errors), so if you are getting a lot of these, and you can get away with using 2.8, I'd suggest doing so.
Presumably conflicts can be reolved programatically by setting a method on the object
_p_resolveConflict( self, old, saved, new )
and returning one or another of the states (old, saved, new). It's not real clear how to do it.
There are examples of "real-world" conflict resolution using this mechanism in the Transience product included in Zope's lib/python/Products. - C
Thanks Chris. On point as usual! I was unaware that the session mechanism used the ZODB although a bit of thought says it has to. We don't store any data into the ZODB in these methods, but we do use the session mechanism heavily. I suppose that moving to a fully transactional database system would be the simplest solution, but I'm a bit wary of doing so on a live system. On Fri, 14 Oct 2005, Chris McDonough wrote:
On Fri, 2005-10-14 at 09:27 -0700, Dennis Allison wrote:
Zope 2.7.6
I am a bit confused.
I have a Zope DTML method that is generating ZODB conflict errors.
The DTML method identified as producing the conflicts is a list of calls to other methods, conditionally executed.
Most conflicts don't cause problems because the backoff and restart of the initial transaction will not have changed global state. In our particular case, the conflicting transaction has changed global state in our RDBMS so when it gets rerun, some RDBMS transactions are duplicated. And that's a problem. The solution, of course, is to resolve the conflicts properly.
Another solution is to use a RDBMS that fully supports transactions. This is almost always the easiest solution. ConflictErrors are a fact of life if you use ZODB to store data; they are impossible to eliminate entirely so no matter what, if you need to communicate with external data stores that aren't transactional (MyISAM tables, LDAP, sendmail, etc.) you need to anticipate duplicated requests in the code that communicates with these systems. For example, Jens Vagelpohl has a replacement for Zope's MailHost that prevents retried requests due to conflict errors from causing a mail to be resent. I've written payment systems that anticipate the fact that the request may be retried, so instead of submitting the payment request twice, the code keeps around a little cache about what it did "last" so it doesn't do it again. And so on.
The first question: what data is generating the conflict?
I believe that if you run Zope event logging at BLATHER level, the traceback of every ConflictError exception is logged, which can give you an idea of what is causing the errors.
The DTML code and all the method references are static and unchanged. What data does Zope store in the ZODB when an object is evaluated?
None that you don't tell it to. Typically conflict errors are a result of two threads calling code which changes the same object at the same time, but nothing that Zope does "under the hood" causes it; it is always caused by application code.
One "exception" to this rule is conflict errors raised when using Zope sessions. It's not actually an exception to the rule, but programmers are shielded from the fact that sessions store data in ZODB when you use the session API (e.g. REQUEST.SESSION). The sessioning machinery needs to manage housekeeping info whenever the API is used to expire old sessions and create new ones, so although it may not "look" like you are writing to the ZODB when you use sessions (even to read data out of them), you potentially are.
Zope 2.8 has a ZODB that support multiversion concurrency control, which eliminates a certain class of conflict errors (read conflict errors), so if you are getting a lot of these, and you can get away with using 2.8, I'd suggest doing so.
Presumably conflicts can be reolved programatically by setting a method on the object
_p_resolveConflict( self, old, saved, new )
and returning one or another of the states (old, saved, new). It's not real clear how to do it.
There are examples of "real-world" conflict resolution using this mechanism in the Transience product included in Zope's lib/python/Products.
- C
--
On Fri, 14 Oct 2005, Chris McDonough wrote:
Dennis Allison asked:
What data does Zope store in the ZODB when an object is evaluated?
None that you don't tell it to. Typically conflict errors are a result of two threads calling code which changes the same object at the same time, but nothing that Zope does "under the hood" causes it; it is always caused by application code.
One "exception" to this rule is conflict errors raised when using Zope sessions. It's not actually an exception to the rule, but programmers are shielded from the fact that sessions store data in ZODB when you use the session API (e.g. REQUEST.SESSION). The sessioning machinery needs to manage housekeeping info whenever the API is used to expire old sessions and create new ones, so although it may not "look" like you are writing to the ZODB when you use sessions (even to read data out of them), you potentially are.
Zope 2.8 has a ZODB that support multiversion concurrency control, which eliminates a certain class of conflict errors (read conflict errors), so if you are getting a lot of these, and you can get away with using 2.8, I'd suggest doing so.
The problem I am trying to resolve appears to be load related. The observed symptom is that (some) session variables spontaneously disappear. There appears to be some connection to conflicts, but the exact mechanism and the relationship is not yet clear. BTW, when I first began trying to resolve this problem, we were running Zope 2.7.6. We moved to Zope 2.8,4 to take advantage of the later ZODB. That move was a good one, but it has cause some migration pain. The session problem has been with us forever, at least since Zope 2.5.0. They are Heisenbugs and appear apparently randomly. Detailed logs should help some, but so far have not brought joy. So, I've been looking through the code trying to find places where some infrequent event could cause the problem. Chris pointed out that session variables can cause conflict errors (both read-read and read-write) when the session API is used. I've been trying to explore that interface and have not yet found all the pieces. Some hints to the reader would help. I see where HTTPRequest manages the Zope user interface to the session variables which includes a mechanism for lazy access--the dictionary _lazies provides a list of callables. When a variable is accessed, it is promoted to Request by executing the callable and storing the value. The _lazies entry corresponding to the session variable is deleted. What I have not been able to find is where this is maintained in a persistent fashion. Can someone provide a pointer?
On Dec 8, 2005, at 9:29 PM, Dennis Allison wrote:
On Fri, 14 Oct 2005, Chris McDonough wrote:
The problem I am trying to resolve appears to be load related. The observed symptom is that (some) session variables spontaneously disappear. There appears to be some connection to conflicts, but the exact mechanism and the relationship is not yet clear.
It's hard to know what's happening here, obviously.
So, I've been looking through the code trying to find places where some infrequent event could cause the problem.
Chris pointed out that session variables can cause conflict errors (both read-read and read-write) when the session API is used. I've been trying to explore that interface and have not yet found all the pieces. Some hints to the reader would help.
If you mean the session API, that's defined in modules within Products/Sessions and Products/Transience.
I see where HTTPRequest manages the Zope user interface to the session variables which includes a mechanism for lazy access--the dictionary _lazies provides a list of callables. When a variable is accessed, it is promoted to Request by executing the callable and storing the value. The _lazies entry corresponding to the session variable is deleted.
Sorry, I'm not sure what this means. Are you describing what happens in the REQUEST.set_lazy code or are you describing a symptom of a problem?
What I have not been able to find is where this is maintained in a persistent fashion. Can someone provide a pointer?
I'm not sure what this means either, sorry! Where what is maintained in a persistent fashion? Sorry to not be more helpful on this go-around. ;-) - C
Dennis Allison wrote at 2005-12-8 18:29 -0800:
... The problem I am trying to resolve appears to be load related. The observed symptom is that (some) session variables spontaneously disappear.
It is very surprising that some (!) session variables should spontaneously disappear -- in fact it is unbelievable. The session machinery has in no way any preference for some variable stored in the session. They are all treated in the same way. It looks almost impossible that some variables vanish spontaneously while others remain. The following is quite normal (for buggy applications): The value of session variables seems to be reset spontaneously. This happens when the value is a mutable object and the mutable object is mutated without a notice for the session. Then, the modified value is available in this cache (until the session is flushed from it). Other caches see the old value. Because it is apparently non-deterministic which cache is used for a request, the observed session value seems to switch between different values. It might be possible (though I have never seen a hint towards this) that due to some bug the session is reset to an earlier state (which did not yet have the session variables you now miss). In this case, you should not only see some variables missing but the others to have (potentially) outdated values. -- Dieter
On 12/9/05, Dennis Allison <allison@shasta.stanford.edu> wrote:
The problem I am trying to resolve appears to be load related. The observed symptom is that (some) session variables spontaneously disappear. There appears to be some connection to conflicts, but the exact mechanism and the relationship is not yet clear.
A small possibility is that you are being bitten by the DWIM'ly nature of TransientObjects conflict resolution where the last modified state is chosen over the others. If you think this is biting you then try commenting out _p_resolveConflict of TransientObject. That's bound to increase the rate of conflict errors but should provide you with a consistent session state. Perhaps useful as a debugging step. michael
Good idea, but it is hard to do in a production environment with a "never lose data" model. I have suspected the p_resolve_conflict which is clearly wrong for our model and am in the process of trying to rewrite it to take advantage of the semantics of sessions as we use them. The problem I am tracking manifests itself as KeyErrors in the session data structure. The session structure is used pretty much out of the box. Reading is done any which way (e.g, REQUEST['SESSION'][key]) but writing always uses a copy out of the session object, an update of the session object, and the a replacement of the session object back into REQUEST. The session data we lose are generally strings, for example, a user_id. For example, we can set the user_id into the session to a value, and then later, when we reference the session variable, we get a KeyError. While there is no direct causal tie, we suspect this is related to an intervening conflict error. Occasionally the entire SESSION data container disappears. At other times, we get a KeyError exceptions for one or more session variables. I have been trying to understand in detail the management of the session variables. I can see how accesses are managed in ZPublisher/HTTPRequest, but I am still unsure of how that session data is maintained persistent and how session data can generate read-read conflicts. I'll try your suggestion. I also plan to monitor session variable access to determine whether the SESSION data whether a KeyError signals that all session variables are missing or whether only a few are missing. On Sun, 11 Dec 2005, Michael Dunstan wrote:
On 12/9/05, Dennis Allison <allison@shasta.stanford.edu> wrote:
The problem I am trying to resolve appears to be load related. The observed symptom is that (some) session variables spontaneously disappear. There appears to be some connection to conflicts, but the exact mechanism and the relationship is not yet clear.
A small possibility is that you are being bitten by the DWIM'ly nature of TransientObjects conflict resolution where the last modified state is chosen over the others. If you think this is biting you then try commenting out _p_resolveConflict of TransientObject. That's bound to increase the rate of conflict errors but should provide you with a consistent session state. Perhaps useful as a debugging step.
michael
--
On 12/11/05, Dennis Allison <allison@shasta.stanford.edu> wrote:
Good idea, but it is hard to do in a production environment with a "never lose data" model.
Have a go at recreating the problems you are seeing on a development host. SessionRig can be used to mount a brute force attack of the session machinery. You'll need to tune that somewhat to your particular application. http://cvs.zope.org/Packages/SessionRig michael
Thanks, I'll take a look. I don't have much faith in getting to do it with the live system, but maybe I can find a way to get some sort of testbed. On Sun, 11 Dec 2005, Michael Dunstan wrote:
On 12/11/05, Dennis Allison <allison@shasta.stanford.edu> wrote:
Good idea, but it is hard to do in a production environment with a "never lose data" model.
Have a go at recreating the problems you are seeing on a development host. SessionRig can be used to mount a brute force attack of the session machinery. You'll need to tune that somewhat to your particular application.
http://cvs.zope.org/Packages/SessionRig
michael
--
participants (4)
-
Chris McDonough -
Dennis Allison -
Dieter Maurer -
Michael Dunstan