Difficulties with MVCC implementation -- now fully explained
was: [ZODB-Dev] Potential BTrees splitting bug
Dieter Maurer
dieter at handshake.de
Sat Sep 27 01:18:17 EDT 2003
Dieter Maurer wrote at 2003-9-26 02:41 +0200:
> > Tim Peters wrote:
> > ...
> > If so, you could probably
> > provoke problems much more easily on a Windows box, where time.time()
> > updates only 18.2 times per second (this is in some sense the opposite of
> > what happens on Linux: on Windows time.time() has only ~= 0.055 second
> > resolution while on Linux time.time() typically has microsecond resolution;
> > but time.clock() on Windows typically has sub-microsecond resolution while
> > time.clock on Linux typically has 0.01 second resolution).
>
> The resolution is much better than I thought (order of ms instead of us).
> This reduces the likelyhood that I already found the true reason
> for the failure.
I did not...
I now have a deterministically failing test case.
It shows the identical failure symptoms as the non-deterministically
failing test. Thus, I expect that the causes are very similar, too.
I have fully analysed the deterministic case: I can explain what
causes the inconsistency in my MVCC implementation and why, instead, a
ReadConflict is raised with MVCC disabled.
The MVCC implementation proposed by Jeremy will not suffer the same
problem.
The next version of my "No more ReadConflicts" patch will
implement (part of) Jeremies proposal and use transaction ids
rather than timestamps for synchronization. This will require
ZODB3 3.2 as the 3.1 invalidation protocol does not report
transaction ids. Unlike Jeremy, I do not (yet?) think that
a revamp of the ZEO Client cache will be necessary (to serve
historical versions of an object). We will see...
For experts and very curious people only,
I attach details for the deterministic test.
Dieter
----------------------------------------------------------------------------
Details:
The deterministic test tries to determinitically emulate
the behaviour of "ZEO.tests.InvalidationTests.checkConcurrentUpdates2Storages".
This test has two threads each inserting a sequence of keys into
a common OOBTree. One thread inserts 2, 4, 6, ...
the other 1, 3, 5, ...
The deterministic emulation replaces the (asynchronous) threads
by (synchronous) tasks of which the insert and commit operations
can be controlled externally.
The failing sequence of events can be represented by
2 1 : 4 3 : 6 5 : 7 : 9 : 11 8 : 13 : 15 10 : 12 17 : 19 14 : 21 : 23 : 25 : 27 : 29 16 : 31 18 : 33 : 35 20 : 22 : 24 : 26 37 : 39 28 : 30 : 32 41 : 43 : 45 34 : 36 : 38 47 : 49 : 51 : 53 40 : 55 42 : 44 57 : <bang>
Each number represents an insert step for the respective number,
each ':' represents the commit for the tasks preceeding the ':' (up to
the previous ':'). Problems begin to start with "53 40 :".
These two transactions filled the second bucket with 31 elements.
Usually, this would have caused a split, but because the 31. th
element was added during conflict resolution, the split is delayed.
Insertion of "55" works on old data (it does not yet know about
the insertion of "40" as the invalidation message has not yet
arrived). It tries to split the node. The consistency check on
commit fails (old data) and conflict resolution is unable
to reconcile split operations. The transaction fails and
flushed both the root as well as the split bucket from ZODB cache.
Insertion of "42" works on new data and splits the node successfully into
two 16 element nodes.
Insertion of "44" is successful.
Insertion of "57" starts. The invalidation messages for "53 40" (it
fact it is a single one) have meanwhile arrived but not the ones
for "55 42". Therefore, the bucket was flushed from the ZEO
Client cache but not yet the root -- both are flushed from the ZODB cache.
"57" reads the root and gets the old (not yet split) copy from
the ZEO cache. It reads the bucket and gets the new (split) state
from the ZEO server (as it is not in ZEO cache).
My MVCC control does not reject this state as it was committed before
this transaction has started. *BANG*: "55" is inserted into the wrong
bucket.
Without my MVCC, loading the state from the server also
delivers the invalidation message for this node (and the root).
A "ReadConflict" is the consequence.
With Jeremy's MVCC proposal, the arrival of the invalidation message
will define the transaction boundary to lie before the invalidating
transaction. The bucket state will be rejected and an old (unsplit)
state reloaded instead. "57" will try to split the node
and fail on commit with a "ConflictError" (old state, unresolvable conflict).
More information about the ZODB-Dev
mailing list