[ZODB-Dev] Re: Can __setstate__ trigger an RCE?

Christian Robottom Reis kiko at async.com.br
Wed Jul 7 10:10:02 EDT 2004


On Wed, Jul 07, 2004 at 07:37:12AM -0400, Jeremy Hylton wrote:
> > > Sounds like a classic hot-spot. I imagine it gets changed quite often
> > > then?
> > 
> > Oh, you know, just about every time an attribute in an instance (in a
> > system where there are around 500,000 of them) is changed <wink>
> 
> So how often do you see conflict errors?  I don't recall from earlier in
> the thread what percentage of transactions see a conflict.  

Rarely -- most users will only see one once every two days or so, and
thousands of transactions are processed daily. However, see below:

> It's hard to avoid them entirely when you have concurrency in your
> application; they're the mechanism for concurrency control after all.

Well, indeed. However because the current architecture uses real
persistent objects attached to the UI, conflicts cause pain to the
end-user -- usually serious pain given "retry" isn't just a matter of
sync(), reapply changes and commit() again. I've rethought this
significantly to use throwaway adapters when attaching to an interface,
but some parts of this [large] application will require some serious
changing -- at the time this was started I didn't realize RCEs would be
a source of trouble.

> > _p_independent, huh? I seem to recall this being evil, but let's have a
> > look at how QueueCatalog implements its conflict resolution to get an
> > idea of just how non-trivial you're implying.
> 
> I sketched an alternative implementation for queue catalog that might be
> worth trying (http://www.python.org/~jeremy/weblog/031031c.html).  The
> idea is to use a set of "buckets," (not necessarily a BTree bucket, just
> a container) where each client gets its own bucket.  The indexing thread
> periodically drains an entire bucket of all its events, which could
> cause a conflict, but you'd try to avoid two clients sharing the same
> bucket.  

(Note that we don't use a separate thread. We drain our buckets each
time a transaction ends -- our two-step commit first commits the change
to the object and the bucket, and then sync()s and drains the buckets.
We do this because we want a consistent view of the indexes over
transaction boundaries -- after a commit() we're guaranteed that query()
will return the changes committed. This goes hand in hand with having
few long-running transactions instead of many short-running ones).

Hmmm. So this means that each transaction would write to its own bucket,
and in our scheme, at the end drain only *that* bucket. That might work
nicely. Can you reality-check the following?

    - Create a Persistent buffer that has _p_independent set to 1.

    - The buffer holds a dictionary (filled with buckets).

    - Hash each different transaction into a bucket stored in a separate
      dictionary key.

    - Each different transaction would then drain its own bucket, and
      leave others untouched.

If correct, then the challenge seems to be avoiding write conflicts to
the *dictionary*. This isn't trivial, because we need to start out with
the dictionary already filled with buckets, and these buckets need to be
reused by transactions, but not simultaneously (by overlapping
transactions). I guess starting with a good number of buckets and just
randomly assigning a client (maybe based on a hash of its transaction
or connection objects mod the number of buckets in our dictionary) would
already reduce conflicts to a minimum, but let's see what you think
about all this.

Take care,
--
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3361 2331


More information about the ZODB-Dev mailing list