[ZODB-Dev] RelStorage and PosKey errors - is this a risky hotfix?
Shane Hathaway
shane at hathawaymix.org
Thu Jan 27 05:33:39 EST 2011
On 01/24/2011 02:02 PM, Anton Stonor wrote:
> Now, I wonder why these pointers were deleted from the current_object
> table in the first place. My money is on packing -- and it might fit
> with the fact that we recently ran a pack that removed an unusual large
> amount of transactions in a single pack (100.000+ transactions).
>
> But I don't know how to investigate the root cause further. Ideas?
I have meditated on this for some time now. I mentioned I had an idea
about packing, but I studied the design and I don't see any way my idea
could work. The design is such that it seems impossible that the pack
code could produce an inconsistency between the object_state and
current_object tables.
I have lots of other ideas now, but I don't know which to pursue. I
need a lot more information. It would be helpful if you sent me your
database to analyze. Some possible causes:
- Have you looked for filesystem-level corruption yet? I asked this
before and I am waiting for an answer.
- Although there is a pack lock, that lock unfortunately gets released
automatically if MySQL disconnects prematurely. Therefore, it is
possible to force RelStorage to run multiple pack operations in
parallel, which would have unpredictable effects. Is there any
possibility that you accidentally ran multiple pack operations in
parallel? For example, maybe you have a cron job, or you were setting
up a cron job at the time, and you started a pack while the cron job was
running. (Normally, any attempt to start parallel pack operations will
just generate an error, but if MySQL disconnects in just the right way,
you'll get a mess.)
- Every SQL database has nasty surprises. Oracle, for example, has a
nice "read only" mode, but it turns out that mode works differently in
RAC environments, leading to silent corruption. As a result, we never
use that feature of Oracle anymore. Maybe MySQL has some nasty
surprises I haven't yet discovered; maybe the MySQL-specific "delete
using" statement doesn't work as expected.
- Applications can accidentally cause POSKeyErrors in a variety of ways.
For example, persistent objects cached globally can cause
POSKeyErrors. Maybe Plone 4 or some add-on uses ZODB incorrectly.
Shane
More information about the ZODB-Dev
mailing list