[Zope-dev] Follow up: Coroner's toolkit for zope, or how to figure out what went wrong.

Romain Slootmaekers romain@zzict.com
Mon, 12 Aug 2002 19:17:09 +0200


Toby Dickenson wrote:
> On Monday 12 Aug 2002 4:50 pm, Joachim Werner wrote:
> 
>>Hi!
>>
>>I know of exactly two cases that could really cause a ZODB loose data: if
>>you reach the 2GB limit with a Python not compiled for larger files and if
>>you reach the physical limit of your storage. That is, if your case doesn't
>>add a third one ...

well, it isn't the 2GB limit, nor the storage limit,...
BTW, i wish I still had your good faith in software :(

> 
> 
> FileStorage is robust and mature, but its not a good as this statement 
> suggests.  There have been a number of bugs that cause packing to delete more 
> than it should (a few very small holes still remain), bugs that cause 
> FileStorage to overwrite the middle of its log file, and bugs that cause its 
> position index to get muddled.

ouch. packing couldn't be the problem though.... (we haven't packed 
recently)

After spending some times looking at the logs. I could dig up the 
following traceback :


   File 
"/home/zope/Zope-2.5.1-linux2-x86/lib/python/ZODB/Connection.py", line 46
3, in setstate
     raise ReadConflictError(object=object)

ReadConflictError: database read conflict error (oid 000000000000bc8d,

our code that causes it basically changes some attributes and then
does a

self._p_changed=1

in some persistent object.


The problem is this:
appearantly that object is also touched in some other thread causing the 
conflict error.

So far I understand what's happening. but then it gets blurry.
the resolution of the conflict handeling somehow drops everything in 
that object tree.

Basically we have

class DB (Persistent,Implicit):
    .....

    def __init__(self):
	self.__dbItems=[]

class DBItem(Persistent,Implicit):
    ....

    def setSomething(self,...):
        ...
        self._p_changed=1




and DB contains a set of DBItem objects, and touching one of them drops 
the DB object.


> 
> 
> The first thing I would recommend trying today is shutting down, removing 
> data.fs.index, and restarting. In recebnt versions data.fs.index make very 
> heavy use of BTrees, and all released versions of the BTree code have small 
> bugs.

hm, isn't there a policy on adding tests that expose the bugs to the set 
of unittests ?

If we (our company that is)can't resolve the problem, we'll have to 
reconsider our strategy on data storage and perhaps even drop the use of 
the ZODB for anything but scripts and static content. all managed 
content types then have to stored in something more robust like some 
relational database, and we all know how well object trees fit into 
relational databases. :(


Anyway, If we find more, then we'll post it here.
Sloot.