[Zope-dev] Coroner's toolkit for zope, or how to figure out what went wrong.

Jim Fulton jim@zope.com
Mon, 12 Aug 2002 15:12:16 -0400


Romain Slootmaekers wrote:
> Jim Fulton wrote:
> 
>> Romain Slootmaekers wrote:
>>
>>> Yo,
>>>
>>> we had a nasty crash of our zope server that we use for a b2b web 
>>> application. The Data.fs ZODB lost a significant amount of data.
>>
>>
>>
>> What sort of crash? Was this a hardware failure, or a software failure?
> 
> 
> software.
> basically, the server didn't crash, but our applications couldn't 
> function anymore because some objects that really have to exist
> were gone.
 >
> the Data.fs was NOT corrupted,
>  but (so far I can tell) a bug in the conflict resolution code caused 
> our object (the one upon we set self._p_changed=1)  to be empty. This 
> object is a container of other objects that are Persistent themselves 
> and at this point, we don't believe the conflict resolution mechanism 
> handles these cases correctly.

I think you are pretty far off here. You said you saw a read conflict.
No conflict resolution is done for a read conflict. Further, from the very
brief description of your DB class, it doesn't appear to use any objects
that actually try to resolve conflicts. I doubt seriously that this has
anything to do with conflict resolution. It is very doubtful that a database
error would cause your data to simply disappear without some sort of error,
like a database corruption error or an error about invalid object ids (dangling
references). Have you considered an application error?

If you still have the data file with the lost data, it should be possible to
analyze it to figure out what went wrong. In particular, it would be helpful
to figure out just what transaction made the data go away to figure out what it
might have been doing.


...

> The stack trace in the follow up mail gives some clue on where the 
> problem is situated in the code. (as well as the exact version of the 
> Zope installation)

No, this is a reh hering. A read conflict can't cause loss of data.
It simply causes the transaction with the read conflict to be reexecuted.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org