[ZODB-Dev] What's best to do when there is a failure in the second phase of 2-phase commit on a storage server
Jim Fulton
jim at zope.com
Wed Oct 1 13:40:11 EDT 2008
On Oct 1, 2008, at 1:21 PM, Dieter Maurer wrote:
> Jim Fulton wrote at 2008-9-30 18:30 -0400:
>> ...
>>>> c. Close the file storage, causing subsequent reads and writes to
>>>> fail.
>>>
>>> Raise an easily recognizable exception.
>>
>> I raise the original exception.
>
> Sad.
>
> The original exception may have many consequences -- most probably
> harmless. The special exception would express that the consequence was
> very harmfull.
The fact that it occurs in this place at all indicates this.
>>> In our error handling we look out for some nasty exceptions and
>>> enforce
>>> a restart in such cases. The exception above might be such a nasty
>>> exception.
>>
>> The critical log entry should be easy enough to spot.
>
> For humans, but I had in mind that software recognizes the exception
> automatically and forces a restart.
I suppose we could define such an exception. A storage that raises it
is indicating that it will come back in some sort of consistent state
after a restart.
> Or do you have a logger customization in mind that intercepts the
> log entry and then forces a restart?
No
...
>>>> - Have a storage server restart when a tpc_finish call fails. This
>>>> would work fine for FileStorage, but might be the wrong thing to do
>>>> for another storage. The server can't know.
>>>
>>> Why do you think that a failing "tpc_finish" is less critical
>>> for some other kind of storage?
>>
>>
>> It's not a question of criticality. It's a question of whether a
>> restart will fix the problem. I happen to know that a file storage
>> would be in a reasonable state after a restart. I don't know this to
>> be the case for some other storage.
>
> But what should an administrator do when this is not the case?
> Either a stop or a restart....
Yes
> It may well be that a restart *may* not lead into a fully functional
> state (though this would indicate a storage bug)
A failure in tpc_finish already indicates a storage bug.
> but a definitely not
> working system is not much better than one that may potentially not
> be fully functional but usually will be apart from storage bugs.
If the alternative to a non-working system is a system with
inconsistent data, I'll take the former.
I can see some benefit from raising a special error to indicate that a
restart would be beneficial. If I hadn't already done the proposed
work, I might even pursue this idea. :) At this point, I think I've
reduced the probability of a failure in FileStorage._finish enough
that further effort, at least by me, isn't warranted.
Jim
--
Jim Fulton
Zope Corporation
More information about the ZODB-Dev
mailing list