[ZODB-Dev] Re: Can __setstate__ trigger an RCE?

Casey Duncan casey at zope.com
Tue Jul 6 16:27:30 EDT 2004


On Tue, 6 Jul 2004 15:07:38 -0300
Christian Robottom Reis <kiko at async.com.br> wrote:

> 
> I'm still trying to track down a problem that occurs with read
> conflicts in IndexedCatalog when using ZODB without MVCC enabled. The
> issue is that occasionally, I find in my client logs a conflict error
> on a OOBucket instance stored as _change_buffer in Catalog persistent
> instances. The interesting part is that the Catalog has the following
> code as part of its __setstate__ method:
> 
>     def __setstate__(self, state):
>         # Load the _change_buffer to ensure we have its state and
>         # avoid a ReadConflictError when handling it
>         list(self._change_buffer.items())

This code is probably more likely to cause an RCE rather than prevent
it, because it tries to load *all* of the buckets of the BTree whenever
the BTree's container is loaded. If somebody else somewhere has modified
the btree in any way since this transaction started, then you're toast
(even if this transaction would never actually read the changed buckets
of the tree).

This effectively eliminates many of the BTree advantages and will scale
poorly as the tree grows (it effectively scans the tree and creates an
arbitrarily large list in memory and discards it on each load). Also it
is impossible to predict when the container object will be loaded, so
this code will execute at arbitrary times, causing random spikes in
load.

In the world without MVCC, we usually survive by assuming we can segment
the data structures to minimize the chance of concurrent updates causing
conflicts. This code effectively undoes that segmentation when the
btree's container object is loaded. Since this code could also take a
long time to run (because it will likely need to read the disk), it
affords an even greater opportunity for read conflicts.

There are some basic strategies that I know of to minimize RCE's a:
- Break up containers so that concurrent access can be minimized.
- Keep transactions simple and short-lived so the chance of overlap is
lessened or at least the retry expense is less.
- Read objects early and store them locally so that you effectively
"cheat" the system and avoid the database seeing the dirty reads if they
happened later.

I think you were shooting for the latter, unfortunately at the expense
of the first two. In order for the latter to be effective, you usually
need to know exactly what objects you will need early in transaction;
this is not always possible. Also it is potentially a dirty read (which
is not visible to the app), which may be bad if the transaction makes
changes based on the out-of-date state it has stored.

In general, more small objects are better then few big ones and more
short and simple transactions are better than few long and complex ones.

If it were me, I would remove this code and see what happens then. If
you still get read conflicts, perhaps changes to the data structures or
order of work in the transaction could help. 

-Casey




More information about the ZODB-Dev mailing list