[ZODB-Dev] B-Tree Concurrency Issue (_OOBTree.pyd segfaults)
Gfeller Martin
Martin.Gfeller at comit.ch
Fri Apr 15 09:07:35 EDT 2005
Dear all,
We're using ZOPE 2.7.3 with its default Python, ZEO, and ZODB versions under Windows 2000 Server SP3. This is a 2xXeon machine, but Python is bound to a single CPU.
One of our(non-data.fs) ZODBs consists of a OOBTree with about 50,000 well-ordered tuple keys and Persistence.Persistent object values.
In production, we got repeatably, but so far not reproducably, a memory access fault in _OOBTree.pyd+x4f93:
eax=00000000 ebx=00000000 ecx=0bffb9c0 edx=00000000 esi=00000000 edi=1667dcb0
eip=01614f93 esp=099cd768 ebp=099cd78c iopl=0 nv up ei pl zr na po
nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
function: <nosymbols>
01614f78 8b55e4 mov edx,[ebp+0xe4] ss:0a44ad5e=????????
01614f7b 8955e0 mov [ebp+0xe0],edx ss:0a44ad5e=????????
01614f7e eb02 jmp 01623a82
01614f80 eb02 jmp 01623a84
01614f82 eba1 jmp 0161db25
01614f84 8b45e4 mov eax,[ebp+0xe4] ss:0a44ad5e=????????
01614f87 8945f0 mov [ebp+0xf0],eax ss:0a44ad5e=????????
01614f8a 8b4d08 mov ecx,[ebp+0x8] ss:0a44ad5e=????????
01614f8d 8b5134 mov edx,[ecx+0x34] ds:0ca78f92=????????
01614f90 8b45f0 mov eax,[ebp+0xf0] ss:0a44ad5e=????????
FAULT ->01614f93 8b4cc204 mov ecx,[edx+eax*8+0x4] ds:00a7d5d3=????????
01614f97 894dec mov [ebp+0xec],ecx ss:0a44ad5e=????????
01614f9a 33d2 xor edx,edx
01614f9c 837d1000 cmp dword ptr [ebp+0x10],0x0 ss:0a44ad5e=????????
01614fa0 0f95c2 setne dl
01614fa3 8b4510 mov eax,[ebp+0x10] ss:0a44ad5e=????????
01614fa6 03c2 add eax,edx
01614fa8 894510 mov [ebp+0x10],eax ss:0a44ad5e=????????
01614fab 8b4d08 mov ecx,[ebp+0x8] ss:0a44ad5e=????????
01614fae 8b55ec mov edx,[ebp+0xec] ss:0a44ad5e=????????
01614fb1 8b4104 mov eax,[ecx+0x4] ds:0ca78f92=????????
01614fb4 3b4204 cmp eax,[edx+0x4] ds:00a7d5d2=????????
In order to narrow this down (while not speaking C), I try (on my single CPU machine) to load the root in a single thread as,
for x in conn.root().keys(): y=x.somedata
while at the same time repeatedly checking the tree in a different thread but using the same connection (as Jim confirms in the mail cited below that this shold be ok):
conn.root()._check()
I repeatably get either a RunTime error 'the bucket being iterated changed size' in the for loop, OR a 'Bucket length < 1' assertion in the _check. After the loop finishes, the tree _check() is ok (it also passes all tests in Btrees.check.check()). The symptoms are the same, where I run under ZEO or directly with FileStorage.
I replaced conn.root().keys() by list(conn.root().keys()) and get the same behavior as above, i.e., either the RunTime error or the transient assertion failure.
Reading the multi-threading ZODB dicussions in http://mail.python.org/pipermail/python-list/2001-February/030675.html, I assume that the above behavior is incorrect, as there are no writes to any object, no commit's and no conflict errors.
Reading the discussion on the RunTime error in [ZODB-Dev] Re: BTrees q [Fwd: [Zope-dev] More Transience weirdness in 2.7.1b1] (http://mail.zope.org/pipermail/zodb-dev/2004-June/007459.html), I get the impression that the segfault and the symptoms described above might be related, perhaps the segfault being in an area where Tim's "required invariant for sane operation" is not being checked.
Of course, the Python crash is what bothers us (as I said, it's a bank site using Quantax) - RunTime errors we can always try around...
In that sense, any help would be enormously appreciated.
Best regards,
Martin Gfeller
________________________
COMIT AG
Risk Management Systems
Pflanzschulstrasse 7
CH-8004 Zürich
Telefon +41 44 298 92 84
http://www.comit.ch
http://www.quantax.com - Quantax Trading and Risk System
More information about the ZODB-Dev
mailing list