[ZODB-Dev] Storm/ZEO deadlocks (was Re: [Zope-dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)
Shane Hathaway
shane at hathawaymix.org
Thu Aug 30 17:19:22 UTC 2012
On 08/30/2012 10:14 AM, Marius Gedminas wrote:
> On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:
>> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <marius at gedmin.as> wrote:
>>> On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:
>>>> On Tue, 28 Aug 2012 16:31:20 +0200,
>>>> Martijn Pieters <mj at zopatista.com> wrote :
>>>>> Anything else different? Did you make any performance comparisons
>>>>> between RelStorage and NEO?
>>>>
>>>> I believe the main difference compared to all other ZODB Storage
>>>> implementation is the finer-grained locking scheme: in all storage
>>>> implementations I know, there is a database-level lock during the
>>>> entire second phase of 2PC, whereas in NEO transactions are serialised
>>>> only when they alter a common set of objects.
>>>
>>> This could be a compelling point. I've seen deadlocks in an app that
>>> tried to use both ZEO and PostgreSQL via the Storm ORM. (The thread
>>> holding the ZEO commit lock was blocked waiting for the PostgreSQL
>>> commit to finish, while the PostgreSQL server was waiting for some other
>>> transaction to either commit or abort -- and that other transaction
>>> couldn't proceed because it was waiting for the ZEO lock.)
>>
>> This sounds like an application/transaction configuration problem.
>
> *shrug*
>
> Here's the code to reproduce it: http://pastie.org/4617132
>
>> To avoid this sort of deadlock, you need to always commit in a
>> a consistent order. You also need to configure ZEO (or NEO)
>> to time-out transactions that take too long to finish the second phase.
>
> The deadlock happens in tpc_begin() in both threads, which is the first
> phase, AFAIU.
>
> AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
> the ZEO commit lock. Then it enters tpc_begin() for Storm's
> StoreDataManager and blocks waiting for a response from PostgreSQL --
> which is delayed because the PostgreSQL server is waiting to see if
> the other thread, Thread #1, will commit or abort _its_ transaction, which
> is conflicting with the one from Thread #2.
>
> Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire the
> ZEO commit lock held by Thread #2.
So thread 1 acquires in this order:
1. PostgreSQL
2. ZEO
Thread 2 acquires in this order:
1. ZEO
2. PostgreSQL
SQL databases handle deadlocks by detecting and automatically rolling
back transactions, while the "transaction" package expects all data
managers to completely avoid deadlocks using the sortKey method.
I haven't looked at the code, but I imagine Storm's StoreDataManager
implements IDataManager. I wonder if StoreDataManager provides a
consistent sortKey. The sortKey method must return a string (not an
integer or other object) that is consistent yet different from all other
participating data managers.
Shane
More information about the ZODB-Dev
mailing list