On Thu, 2003-05-29 at 01:08, Jeffrey P Shell wrote:
Thanks for the information. Is it safe at all to try to catch a ConflictError during the critical part of the code, log some information, and then reraise the error to let the system do what it needs?
Sure, but I'm not sure what that buys you in your case. The system will still retry the request if you reraise a conflict error. And it would be spotty coverage at best; it's almost impossible to know where a ConflictError might be raised. The only reasonable "solution" would be to change ZPublisher's default behavior to not retry requests on conflict errors, which is probably not what you want either.
I guess you're right though - it's hard to know when it will occur.
In the production system, in this particular method, there are only two known persistent object interactions. At the end of the entire method, after a notification email has been sent, I have something like:
session['pieces'] = {}
(session['pieces'] was a dictionary of {item_id:integer} bits. It never gets large for an individual user). I think that the one recent case of desync'd data happened when we got to this point. Since it's at the very end of the script (no more writes are expected beyond this point), I imagine that a get_transaction().commit() might be OK to precede this statement, just so that even if any conflicts happen when trying to write back to the session, we at least have synchronized data between the two systems. Although, prior to this, there are a few reads of this session data. Might it be safer to do something like this at the top of the method?:
pieces = session['pieces'].copy()
pieces = session.get('pieces', {} ..at the top of the method might be better, particularly because you'll need to explicitly resave the dictionary into the session like so at the end of the method anyway: session['pieces'] = pieces (standard persistence rules apply to session data as well, so you need to restore basic types after you mutate them if you want the changes to persist). We've also found that accessing session data early in the request can help reduce the number of conflicts that happen later in the request. See http://mail.zope.org/pipermail/zope-dev/2003-March/019081.html for more information.
I apologize if this post is making little sense (or stupid sense) - dealing with threads, locks, conflicts, etc, has been the part of Zope I've understood the least. I like that for the most part I don't have to think about it, but I don't know where to go for [fairly] current documentation on how to deal with it for those rare times I do.
FWIW, the Zope Book 2.6 edition session chapter speaks a bit to what conflict errors are. The ZDG persistence chapter talks a bit about threading and concurrency.
The other persistent data write occurs earlier in the method, an object that generates serial numbers based off of some simple data in a PersistentMapping gets updated. I think that PersistentMapping has become fairly large by now. It maps the item_id referenced above to a regular dictionary containing three key/value pairs each. I make sure to follow the rules of persistence when dealing with these dictionaries-with-a-PersistentMapping, but I'm guessing that an OOBTree might be better instead. I still don't understand the potential pitfalls of Zope/ZODB BTrees (I keep reading about 'bucket splits' causing conflicts, and I don't know if that would be better or worse than any pitfalls a PersistentMapping gives).
Know that any change to a PersistentMapping needs to load and repersist the entire data set in the mapping when a key or value is updated or added. It is very likely that this will cause a conflict, particularly when two threads try to do this at once. OTOH, a BTree is made up of many other persistent subobjects, and there is less of a chance (but still a good chance) that two concurrent accesses to a BTree will cause a conflict error.
Finally, the system in question has a few (three? four?) public Zope sites using the same session storage. Is there any documentation, notes, etc, about fine tuning the default session storage set up to handle large sites (or groups of sites) with less conflicts?
The best source of docs for sessions in the 2.6 Zope Book sessions chapter. The maillist thread that I mentioned above gives some information from Toby Dickenson about accessing session data early in a transaction to reduce the possibility of read conflicts.
Thanks again for the help. I'll take a look at MailDropHost. Maybe I'll have to wrap another gateway around the gateway to the external system to try to catch these conflict situations. Fortunately, the critical area only occurs once in the current copy of the code. Hopefully that will make it easier to protect.
Good luck! - C
Thanks again, Jeffrey