mysterious dataloss clues?
Hi there, I'm posting this in the hope someone will be able to help me with some mysterious data lossage we've been experiencing. I basically am completely clueless what could be the cause of this, so any hint would be useful. Some boring numbers first: We have an application running on Zope 2.3.0, connecting to Postgresql 7.0 using ZPoPyDA 1.2. The version of PoPy is 2.0.2. This is on Debian 2.2 (with some upgraded packages such as Postgresql). Now to the meat of the issue: For some reason, at random times, Postgresql seems to magically lose some records that were already there and should've been committed. This seems to happen within a few hours of the time the records were first created. All trace of the records disappears from the database; we can see the holes in the autoincrement primary key fields (and we never do any delete ourselves that could cause this). The lossage seems internally consistent; a bunch of records that's interrelated all disappears. The records disappearing are not all part of a single Zope transaction; instead there are bunch of ZSQL insert operations in a set of pages. It's just as if there's some kind of long running transaction that at some point magically expires. Very weird. We haven't been able to reproduce the problem; it just seems to happen. Vaguely associated with it is another, possibly related problem; occasionally Zope does not see a record with an id that is definitely in the database. After a few more tries, it works again. We haven't been able to confirm a firm temporal relationship between this problem and the one described, but it's possible one somehow affects the other. Is it possible that due to some strange interaction postgresql's transaction machinery ends up in some flakey state, so that something is never committed that really should be? I'm now trying to change all the tires and the engine as well; upgrading this and that to see if the problem will go away. But some clues in which direction I should search would be greatly appreciated. Anyone ever saw something like this? I really wish I could be more specific. :) Regards, Martijn
Martijn Faassen writes:
For some reason, at random times, Postgresql seems to magically lose some records that were already there and should've been committed. This seems to happen within a few hours of the time the records were first created. All trace of the records disappears from the database; we can see the holes in the autoincrement primary key fields (and we never do any delete ourselves that could cause this). Several months ago, someone reported (zope-dev mailing list) a hole in the database adapters transaction handling. Jim confirmed the bug.
When my memory is right, the hole means that database transactions may be hanging, Zope forgets to either commit or abort them. I does not look like your problem, but maybe the two problems are related... Dieter
Dieter Maurer wrote:
Martijn Faassen writes:
For some reason, at random times, Postgresql seems to magically lose some records that were already there and should've been committed. This seems to happen within a few hours of the time the records were first created. All trace of the records disappears from the database; we can see the holes in the autoincrement primary key fields (and we never do any delete ourselves that could cause this). Several months ago, someone reported (zope-dev mailing list) a hole in the database adapters transaction handling. Jim confirmed the bug.
Hm, I'll try to look for that. Is there a fix, and if so, which Zope version? :)
When my memory is right, the hole means that database transactions may be hanging, Zope forgets to either commit or abort them.
I does not look like your problem, but maybe the two problems are related...
Well, a non-commit or abort *could* mean weird behavior like this. I've also found some ConflictErrors generated by ZODB/Connection.py that may be related. Thanks, Martijn
participants (2)
-
Dieter Maurer -
Martijn Faassen