Re: sessions in the presence of conflicts
[Using zope-dev@ instead of zope@] Dennis Allison wrote:
A more session-friendly conflict resolution might use:
1. if any of the states are invalid (that is, has a key '_invalid') return the invalid state.
2. if any any of the states attributes ['token','id','_created'] differ then there is a conflict, raise the conflict exception.
3. order the newState and savedState by modification time (or if that cannot be computed, by access time).
4. any key appearing in oldState's dictionary but not appearing in both savedState and newState should be removed from all. This corresponds to a key-value pair being deleted in one of the transactions. Insertions will be managed automatically by the updates.
5. beginning with the oldest, update oldState dictionary of key-value pairs using the dictionary part of newState and savedState. Return oldState.
This does several things. First, it captures independent key-value changes made in both potentially conflicting transactions. Second, it provides a reasonable ordering for multiple (potentially conflicting) key-value pair updates. Third, it manages insertions and deletions to the session variable set in the presence of conflicts.
Does this make sense? I have yet to figure out how to map a TransientObject "state" back to the object it represents, but it clearly is possible.
I certainly makes sense from a high level description, but the devil is in the details. I'd be interested in looking at it if you code something. Florent -- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com
Code would be good. Note that changing the transientobject conflict resolution algorithm won't get rid of all write conflict errors, because the BTree-based indexes in the transient object container will still conflict during a "bucket split" and other situations that I can't exactly recall (they're documented in the BTrees source code). In fact, before you spend a lot of time tuning the TO conflict resolution algorithm, you should make sure that the majority of conflicts you're seeing do indeed come out of attempting to resolve conflicting transientobject states (as per the conflict error traceback). Conflict resolution algorithms are difficult and any algorithm will have DWIM-y tradeoffs, so it's useful to keep it as simple as possible. Note also that if you store your session data in a ZEO server in order to do *any* transience write conflict resolution, the ZEO server process needs to have Products.Transience on its PYTHONPATH (as it needs access to the resolution code). You also still haven't told us if you've tuned any of the knobs that I recommended you tune, so if you haven't, do that first. ;-) - C On Dec 15, 2005, at 5:35 AM, Florent Guillaume wrote:
[Using zope-dev@ instead of zope@]
Dennis Allison wrote:
A more session-friendly conflict resolution might use: 1. if any of the states are invalid (that is, has a key '_invalid') return the invalid state. 2. if any any of the states attributes ['token','id','_created'] differ then there is a conflict, raise the conflict exception. 3. order the newState and savedState by modification time (or if that cannot be computed, by access time). 4. any key appearing in oldState's dictionary but not appearing in both savedState and newState should be removed from all. This corresponds to a key-value pair being deleted in one of the transactions. Insertions will be managed automatically by the updates. 5. beginning with the oldest, update oldState dictionary of key-value pairs using the dictionary part of newState and savedState. Return oldState. This does several things. First, it captures independent key-value changes made in both potentially conflicting transactions. Second, it provides a reasonable ordering for multiple (potentially conflicting) key-value pair updates. Third, it manages insertions and deletions to the session variable set in the presence of conflicts. Does this make sense? I have yet to figure out how to map a TransientObject "state" back to the object it represents, but it clearly is possible.
I certainly makes sense from a high level description, but the devil is in the details. I'd be interested in looking at it if you code something.
Florent
-- Florent Guillaume, Nuxeo (Paris, France) CTO, Director of R&D +33 1 40 33 71 59 http://nuxeo.com fg@nuxeo.com _______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Chris McDonough]
Note that changing the transientobject conflict resolution algorithm won't get rid of all write conflict errors, because the BTree-based indexes in the transient object container will still conflict during a "bucket split" and other situations that I can't exactly recall (they're documented in the BTrees source code).
A more readable account is here: http://www.zope.org/Wikis/ZODB/BTreeConflictResolution BTrees are mappings too, and looks like Dennis is trying to apply "similar" conflict-resolution rules to session mapping objects.
Conflict resolution algorithms are difficult and any algorithm will have DWIM-y tradeoffs, so it's useful to keep it as simple as possible.
Or no more complex as is actually helpful ;-)
...
[Dennis Allison]
I have yet to figure out how to map a TransientObject "state" back to the object it represents, but it clearly is possible.
I didn't see a response to that bit yet, so: "the state" of an object P is whatever P.__getstate__() returns. Given such a return value `state`, and some object Q of the same type as P, Q.__setstate__(state) gives Q the same state P had. What state "means" is entirely up to the type's __setstate__() and __getstate__() implementations (if any). Objects deriving from Persistent inherit (by default) implementations that retrieve and update an instance's __dict__. BTrees.Length is a good example of a class that overrides these methods, using an integer as "the state".
Zope 2.8.4, ZODB 3.4.2 Chris, I'm pretty sure that I mentioned having done that in one of my postings. I have followed your recommendations, but the problem remains. (um... persists <grin>) The systems are running a Zope/ZEO combination with a store configuration of: # <zodb_db temporary> # Temporary storage database (for sessions) <temporarystorage> name temporary storage for sessioning </temporarystorage> mount-point /temp_folder container-class Products.TemporaryFolder.TemporaryContainer </zodb_db> # # ZEO client storage: # <zodb_db main> mount-point / # ZODB cache, in number of objects cache-size 5000 <zeoclient> server 192.168.0.92:8301 storage 1 var $INSTANCE/var # ZEO client cache, in bytes cache-size 20MB # Uncomment to have a persistent disk cache client group1-zeo </zeoclient> </zodb_db> # Although the connection to ZEO is via a network port, it runs on the same physical hardware. TemporaryStorage is not transactional. Does it need to be under MVCC? TemporaryStorage does provide a conflict cache to do "rudimentary conflict resolution". There are several timing and scaling parameters that need to be considered: CONFLICT_CACHE_MAXAGE = 60 (seconds) CONFLICT_CACHE_GCEVERY = 60 (seconds) RECENTLY_GC_OIDS_LEN = 200 Entries in the recently gc's oids list are those which may be resolvable by a retry. These numbers may be too small given the loads we see and the number of accesses made to the session variables. I plan to incrase them to see if there is any impact. MAYBE CONFLICTS AND THEIR RESOLUTION ARE NOT THE ROOT CAUSE OF THE SESSION VARIABLE PROBLEM. The observed problem is that session variables suddenly disappear. At the point of failure due to a KeyError, inspecting the SESSION object shows two failure modes: either all the session variables are gone and only the container remains or most of the session variables are gone and a few remain. 74769573A2H7SIH2AKo=id: 11343269231636975299, token: 74769573A2H7SIH2AKo, contents keys: ['currentTab', 'calendarPage', 'currentCourse', 'currentTextbook'] and 77307574A2HTTdXCYYg=id: 11343267811075063138, token: 77307574A2HTTdXCYYg, contents keys: [] Access to the session variables are almost alwsys through a pair of Scripts(Python). Occasionally a session variable is read with an expression of the form REQUEST['SESSION']['key']. ## Script (Python) "getSessionVariable" ##bind container=container ##bind context=context ##bind namespace= ##bind script=script ##bind subpath=traverse_subpath ##parameters=varname ##title= ## request=container.REQUEST session=request['SESSION'] return session[varname] # Script (Python) "setSessionVariable" ##bind container=container ##bind context=context ##bind namespace= ##bind script=script ##bind subpath=traverse_subpath ##parameters=var, val ##title= ## request = container.REQUEST RESPONSE = request.RESPONSE session=request['SESSION'] session[var]=val request.set( 'SESSION', session ) This all seems right to me. Any suggestions as to how to localized when the session variables get lost? That might help localize the root cause.
Trimmed zodb-dev off the cc list. On Dec 15, 2005, at 2:24 PM, Dennis Allison wrote:
The systems are running a Zope/ZEO combination with a store configuration of:
# <zodb_db temporary> # Temporary storage database (for sessions) <temporarystorage> name temporary storage for sessioning </temporarystorage> mount-point /temp_folder container-class Products.TemporaryFolder.TemporaryContainer </zodb_db>
OK, that's good. Having a nonzeo session database is a good thing.
Although the connection to ZEO is via a network port, it runs on the same physical hardware.
TemporaryStorage is not transactional. Does it need to be under MVCC?
I thought we sewed this one up. ;-) It is indeed transactional. See our email conversations around 11/15/2005 and beyond.
TemporaryStorage does provide a conflict cache to do "rudimentary conflict resolution". There are several timing and scaling parameters that need to be considered:
CONFLICT_CACHE_MAXAGE = 60 (seconds) CONFLICT_CACHE_GCEVERY = 60 (seconds) RECENTLY_GC_OIDS_LEN = 200
Entries in the recently gc's oids list are those which may be resolvable by a retry.
These numbers may be too small given the loads we see and the number of accesses made to the session variables. I plan to incrase them to see if there is any impact.
That's fine.
MAYBE CONFLICTS AND THEIR RESOLUTION ARE NOT THE ROOT CAUSE OF THE SESSION VARIABLE PROBLEM.
That's possible.
either all the session variables are gone and only the container remains or most of the session variables are gone and a few remain.
The former sounds like the session may have expired. The second sounds like it could be due to conflict resolution. You don't say where the KeyError is raised from, that's important.
74769573A2H7SIH2AKo=id: 11343269231636975299, token: 74769573A2H7SIH2AKo, contents keys: ['currentTab', 'calendarPage', 'currentCourse', 'currentTextbook']
and
77307574A2HTTdXCYYg=id: 11343267811075063138, token: 77307574A2HTTdXCYYg, contents keys: []
Note that these are actually two different session objects (see the "token"). I don't know if this is meaningful.
Access to the session variables are almost alwsys through a pair of Scripts(Python). Occasionally a session variable is read with an expression of the form REQUEST['SESSION']['key'].
## Script (Python) "getSessionVariable" ##bind container=container ##bind context=context ##bind namespace= ##bind script=script ##bind subpath=traverse_subpath ##parameters=varname ##title= ## request=container.REQUEST session=request['SESSION'] return session[varname]
How about session.get('varname') instead? What happens when the session expires? Do you deal with that in your code in some way?
# Script (Python) "setSessionVariable" ##bind container=container ##bind context=context ##bind namespace= ##bind script=script ##bind subpath=traverse_subpath ##parameters=var, val ##title= ## request = container.REQUEST RESPONSE = request.RESPONSE session=request['SESSION'] session[var]=val request.set( 'SESSION', session )
This all seems right to me. Any suggestions as to how to localized when the session variables get lost? That might help localize the root cause.
What is request.set('SESSION', session) supposed to do? Whatever it's supposed to be doing is probably not what you think it's doing. I actually don't know quite what the impact of doing that would be; it would depend on your other code. Remove that, it can't do anything good. ;-) If you're trying to make sure the session is "re- stored" in the ZODB, you needn't. You only need to do this for mutable variables *in* the session, eg.: request.SESSION['a'] = {} a = request.SESSION['a'] a['b'] = 'c' # here it comes request.SESSION['a'] = a I also enumerated other knobs to you in previous emails that could help reduce sessioning conflicts (like turning off inband housekeeping -- see the Transience.py code for that -- and bumping up transient object container timeout resolution via session-timeout- resolution in zope.conf in your current config). Those are the things I meant when I said that you hadn't told us whether you had taken steps based on recommendations. I won't enumerate them here again here as it should be reasonably easy to go look in the archives or your mail history for those recommendations. I'd suggest trying these recommendations before playing around with conflict resolution code. Dunny also suggested trying to mess with the WRITEGRANULIARITY in TransientObject.py, FWIW. - C
On Thu, 15 Dec 2005, Chris McDonough wrote:
Trimmed zodb-dev off the cc list.
On Dec 15, 2005, at 2:24 PM, Dennis Allison wrote:
The systems are running a Zope/ZEO combination with a store configuration of:
# <zodb_db temporary> # Temporary storage database (for sessions) <temporarystorage> name temporary storage for sessioning </temporarystorage> mount-point /temp_folder container-class Products.TemporaryFolder.TemporaryContainer </zodb_db>
OK, that's good. Having a nonzeo session database is a good thing.
Although the connection to ZEO is via a network port, it runs on the same physical hardware.
TemporaryStorage is not transactional. Does it need to be under MVCC?
I thought we sewed this one up. ;-) It is indeed transactional. See our email conversations around 11/15/2005 and beyond.
**** That's what I thought as well, but I did not see any transaction rollback mechanism. loadBefore() give access to earlier transactions (as long as they are not garbage collected) as needed for MVCC. Ooops, it inherits from BaseStorage that has the transaction management stuff. I see it now.
TemporaryStorage does provide a conflict cache to do "rudimentary conflict resolution". There are several timing and scaling parameters that need to be considered:
CONFLICT_CACHE_MAXAGE = 60 (seconds) CONFLICT_CACHE_GCEVERY = 60 (seconds) RECENTLY_GC_OIDS_LEN = 200
Entries in the recently gc's oids list are those which may be resolvable by a retry.
These numbers may be too small given the loads we see and the number of accesses made to the session variables. I plan to incrase them to see if there is any impact.
That's fine.
MAYBE CONFLICTS AND THEIR RESOLUTION ARE NOT THE ROOT CAUSE OF THE SESSION VARIABLE PROBLEM.
That's possible.
either all the session variables are gone and only the container remains or most of the session variables are gone and a few remain.
The former sounds like the session may have expired. The second sounds like it could be due to conflict resolution. You don't say where the KeyError is raised from, that's important.
****KeyErrors are raised by getSessionVariable(), code below.
74769573A2H7SIH2AKo=id: 11343269231636975299, token: 74769573A2H7SIH2AKo, contents keys: ['currentTab', 'calendarPage', 'currentCourse', 'currentTextbook']
and
77307574A2HTTdXCYYg=id: 11343267811075063138, token: 77307574A2HTTdXCYYg, contents keys: []
Note that these are actually two different session objects (see the "token"). I don't know if this is meaningful.
Access to the session variables are almost alwsys through a pair of Scripts(Python). Occasionally a session variable is read with an expression of the form REQUEST['SESSION']['key'].
## Script (Python) "getSessionVariable" ##bind container=container ##bind context=context ##bind namespace= ##bind script=script ##bind subpath=traverse_subpath ##parameters=varname ##title= ## request=container.REQUEST session=request['SESSION'] return session[varname]
How about session.get('varname') instead? What happens when the session expires? Do you deal with that in your code in some way?
**** session.get('varname') would not work as the variable being accessed is a parameter. Are you questioning the syntax or the sematics? **** sessions do not expire or rather have lifetimes that can be ignored. We support and expect multi-hour sessions. Other constraints ensure that users sessions terminate before they time out.
# Script (Python) "setSessionVariable" ##bind container=container ##bind context=context ##bind namespace= ##bind script=script ##bind subpath=traverse_subpath ##parameters=var, val ##title= ## request = container.REQUEST RESPONSE = request.RESPONSE session=request['SESSION'] session[var]=val request.set( 'SESSION', session )
This all seems right to me. Any suggestions as to how to localized when the session variables get lost? That might help localize the root cause.
What is request.set('SESSION', session) supposed to do? Whatever it's supposed to be doing is probably not what you think it's doing. I actually don't know quite what the impact of doing that would be; it would depend on your other code. Remove that, it can't do anything good. ;-) If you're trying to make sure the session is "re- stored" in the ZODB, you needn't. You only need to do this for mutable variables *in* the session, eg.:
request.SESSION['a'] = {} a = request.SESSION['a'] a['b'] = 'c' # here it comes request.SESSION['a'] = a
**** Hmmm... you may have hit upon the problem. The setSetSessionVariable code's intent was to copy out the entire session object, mutate it, and then write it back into REQUEST, the approved way blessed in some of the ancient documentation for the initial implementation of sessions in Zope 2.5.1. You are saying that this is not necessary at the level of the SessionObject and only really needed when the object being stored in the session is itself mutable (e.g., a list or a dictionary). And the problems I have been seeing are likely to be due to an intereaction between the persistence mechanism and the way I was managing persistence. I'll rewrite the setSessionVariable() routine and see if that resolves the lost session variable problem.
I also enumerated other knobs to you in previous emails that could help reduce sessioning conflicts (like turning off inband housekeeping -- see the Transience.py code for that -- and bumping up transient object container timeout resolution via session-timeout- resolution in zope.conf in your current config). Those are the things I meant when I said that you hadn't told us whether you had taken steps based on recommendations. I won't enumerate them here again here as it should be reasonably easy to go look in the archives or your mail history for those recommendations. I'd suggest trying these recommendations before playing around with conflict resolution code. Dunny also suggested trying to mess with the WRITEGRANULIARITY in TransientObject.py, FWIW.
I have with little effect on the primary problem. Thanks for the perceptive comments and sharp eyes. --
Chris McDonough identified a persistence problem with the routine(s) that manage sessions variables. (Thanks Chris) I have put the correction in place which resolved some (but not all) of the problems. There are still problems which are apparently due conflicts in accessing the session variables. To minimize frequency of conflicts, I am rewriting several routines using Dieter's rules of the thumb (Thanks Dieter). One routine being modified is a Script(Python) that initializes a number of session variables. I am collecting the session values in a dictionary and then use update to set their value, for example: s = {} s['alpha'] = 'a' s['beta'] = 'b' request['SESSION'].update(s) Is the persistence machinery smart enough to detect this as a change? I suspect that it has to be flagged since the assignment won't be seen. Usually this means setting the _p_changed=1 attribute, but it is not clear to me where to set it in this particular context. --
On 12/16/05, Dennis Allison <allison@shasta.stanford.edu> wrote:
MAYBE CONFLICTS AND THEIR RESOLUTION ARE NOT THE ROOT CAUSE OF THE SESSION VARIABLE PROBLEM. The observed problem is that session variables suddenly disappear.
Perhaps your app is tripping over some bug in conflict handling. But I'd say it is worth entertain other ideas too. For now, just comment out all of TransientObject._p_resolveConflict and if you still get errors then you know you have to look elsewhere. (Sure, some of that elsewhere may well include the conflict resolution code above _p_resolveConflict.) Your application and sessions should cope just fine in the face of any ConflictError. ConflictError's are an essential part of the machinery that keeps data state consistent. As Chris mentions, look at how your using sessions and some of the assumptions you might be making. Might be useful to try with sessions that don't timeout, set session-timeout-minutes to 0. And try maximum-number-of-session-objects of 0. Also trying the turning those knobs the other way, session-timeout-minutes of 1 and maximum-number-of-session-objects of 2. For now, stay focused on making sure you maintain a consistent state. Only once you have a consistent state then is it worth trying to improve the rate of ConflictErrors. (Which in your case of sessions lasting for many hours I think the numbers you quote elsewhere are too small. And, yeah, an alternative implementation for _p_resolveConflict might help there. Personally I prefer the simple approach of just commenting that out completely and living with a slightly higher conflict count.) Might be worth trying without ZEO in the mix. Definitely worth the effort, if you have not already, to recreate the whole system on a separate host that you can feel comfortable making changes to. Then you can happily tune the various knobs downwards which may help with trying to observe the problem. For example session-timeout-minutes of 1. cheers michael
participants (5)
-
Chris McDonough -
Dennis Allison -
Florent Guillaume -
Michael Dunstan -
Tim Peters