Re: [ZODB-Dev] Re: BTrees strangeness (was [Zope-dev] Zope 2.X BIG Session problems - blocker - our site dies - need help of experience Zope developer, please)
On Wed, 2004-03-03 at 22:20, Casey Duncan wrote:
for key in list(self._data.keys(None, max_ts)): assert(key <= max_ts) STRICT and _assert(self._data.has_key(key)) for v in self._data[key].values(): to_notify.append(v) del self._data[key]
Maybe you could use items() and two loops instead;
to_rm = [] for key, val in self._data.items(None, max_ts): for v in val.values(): to_notify.append(v) to_rm.append(key) for key in to_rm: try: del self._data[key] except Keyerror: pass # Somebody else deleted it first
I don't think that could raise a KeyError...
Well, the real bit of magic there is the "try.. except KeyError: pass" stanza. Believe me, I'm tempted to stick that in, but this is the kind of voodoo that got me in to a lot of trouble in the older version of this code (there was reams upon reams of voodoo in the old code), so I'd really rather just figure out why the code is failing in the first place. I'd just rather not mask the problem until I understand the cause. That may never happen, of course, but a man can dream. - C
Chris McDonough wrote:
On Wed, 2004-03-03 at 22:20, Casey Duncan wrote:
for key in list(self._data.keys(None, max_ts)): assert(key <= max_ts) STRICT and _assert(self._data.has_key(key)) for v in self._data[key].values(): to_notify.append(v) del self._data[key]
Maybe you could use items() and two loops instead;
to_rm = [] for key, val in self._data.items(None, max_ts): for v in val.values(): to_notify.append(v) to_rm.append(key) for key in to_rm: try: del self._data[key] except Keyerror: pass # Somebody else deleted it first
I don't think that could raise a KeyError...
Well, the real bit of magic there is the "try.. except KeyError: pass" stanza. Believe me, I'm tempted to stick that in, but this is the kind of voodoo that got me in to a lot of trouble in the older version of this code (there was reams upon reams of voodoo in the old code), so I'd really rather just figure out why the code is failing in the first place. I'd just rather not mask the problem until I understand the cause. That may never happen, of course, but a man can dream.
If I'm following this thread correctly, isn't the code failing because the BTree is corrupted (that is, BTrees.check.check chokes)? If that's the case then you're certainly right to avoid masking the problem. -John -- http:// if ile.org/
On Wed, 2004-03-03 at 22:53, John Belmonte wrote:
If I'm following this thread correctly, isn't the code failing because the BTree is corrupted (that is, BTrees.check.check chokes)? If that's the case then you're certainly right to avoid masking the problem.
We don't know that it's corrupted for sure yet because the problem has not yet made itself repeatable in isolation, and has only appeared in one production setup (Alex's). Hopefully Alex will instrument his code to do the check when it chokes and we'll know more then. I've also asked him to change out the storage that's being used to hold this data to a FileStorage. I have some fear about the current storage code that he is using to store this data (TemporaryStorage), as it is not typically tested via unit tests for simultaneous access and does some hairy inplace garbage collection that other storages don't do. - C
[Chris McDonough]
... I'd really rather just figure out why the code is failing in the first place. I'd just rather not mask the problem until I understand the cause. That may never happen, of course, but a man can dream.
I definitely want to know it if there's still a way remaining to provoke conflict resolution into creating insane BTrees, although I only care if it's a most-recent version of ZODB (3.1.5, 3.2.1, or HEAD) (earlier versions have known, relevant bugs that have been fixed). BTrees appear sensitive to tiny timing holes just because they're complicated data structures and are involved in conflict resolution a lot. But apart from the bugs in the BTree implmentation fixed a loooong time ago, no other "corruption bug" we've squashed since then actually had anything to do with BTrees -- they were general timing holes that could corrupt anything at all involved in conflict resolution (generally hard-to-provoke failure of invalidation to keep caches consistent).
participants (3)
-
Chris McDonough -
John Belmonte -
Tim Peters