[ZODB-Dev] [ZEO] rather long periods of unresponsiveness

Sat Dec 18 12:27:54 EST 2004

In our installation, ZEO (from ZODB 3.2) runs on a high availability cluster.
The cluster periodically probes ZEO for responsiveness and
restarts it when it becomes irresponsive.

When a larger transaction is committed (I checked with a 
transaction of size 35 MB affecting 250.000 objects),
then ZEO is irresponsive for about a minute.

The reason for this long irresponsiveness time lies in a special (facinating)
implementation of the two phase commit by ZEO:

  In the first phase, ZEO does not store the changed object data in the
  storage directly but puts it into a "CommitLog" (essentially
  a temporary file).

  Only in the "vote" (end of first commit phase),
  ZEO calls the storage's "tpc_begin" (and thereby acquires
  the storage's commit lock) and then transfers the changed
  data from the "CommitLog" to the storage.
  Depending on the size of the transaction and the number
  of affected objects, this can take a long time and
  ZEO is irresponsive during this time.

What do you think about executing "vote" (more precisely
"_vote") in a separate thread when the transaction is sufficiently
large (with respect to size or number of modified objects)?
This would allow the main ZEO thread to
process other unrelated requests in parallel.

-- 
Dieter