[ZODB-Dev] Persistent ZEO Cache corruption?

Thu Jan 12 11:53:23 EST 2006

I have seen a similar problems bringing up a system which has been
brought down with the usual

	zeoctl stop
	zopectl start 

mantra.  The zope/zeo connection is via a network connection.
Next time it happens I'll file a bug report.

On Thu, 12 Jan 2006, Sidnei da Silva wrote:

> On Thu, Jan 12, 2006 at 10:17:54AM -0500, Tim Peters wrote:
> | [Sidnei da Silva]
> | >> Every now and then I face a corruption of the persistent zeo cache, but
> | >> this is the first time I get this variant.
> | 
> | What other variants do you see?
> 
> Can't remember right now, it was quite some time ago and involved
> making changes to one zeo client while the other one was down using
> 'zopectl debug'. Seen it about 6 times in different environments, so
> should be reproduceable.
> 
> | >> The cause is very likely to be a forced shutdown of the box this zope
> | >> instance was running on, but I thought it would be nice to report the
> | >> issue.
> | 
> | Yes it is!  Thank you.  It would be better to open a bug report ;-).
> 
> Sure will.
> 
> | >> Here's the traceback::
> | >>
> | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/ClientStorage.py", line
> | 314, in __init__
> | >>   self._cache.open()
> | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 112, in
> | open
> | >>    self.fc.scan(self.install) File
> | >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 835, in scan
> | >>    install(self.f, ent) File
> | >> "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 121, in
> | install
> | >>   o = Object.fromFile(f, ent.key, skip_data=True)
> | >> File "/home/sidnei/src/zope/28five/lib/python/ZEO/cache.py", line 630, in
> | fromFile
> | >>   raise ValueError("corrupted record, oid")
> | >> ValueError: corrupted record, oid
> | >>
> | >> I have a copy of the zeo cache file if anyone is interested.
> | 
> | Attaching a compressed copy to the bug report would be best (if it's too big
> | for that, or it's proprietary, let me know how to get it and I'll put it on
> | an internal ZC machine).  Can't tell in advance whether that will reveal
> | something useful, though (see below).
> 
> Don't think there might be anything sensitive in there, maybe my blog
> password in the worst case *wink*. Here's the files (zeo1-1.zec is
> probably the one you're after):
> 
> http://awkly.org/files/zeo-cache.tar.bz2
> 
> | > It seems as though persistent caches haven't been a very sucessful
> | > feature. Perhaps we should abandon them.
> | 
> | They do seem to be implicated in more than their share of problems, both
> | before and after MVCC.
> | 
> | The post-MVCC ZEO persistent cache _intends_ to call flush() after each file
> | change.  If it's missing one of those, and depending on what "forced
> | shutdown" means exactly, that could be a systematic cause for corruption.
> | It doesn't call fsync() unless it's explicitly closed cleanly, but it's
> | unclear what good fsync() actually does across platforms when flush() is
> | called routinely and the power stays on.
> 
> Oh, I really meant to say "accidental shutdown", though I wasn't around
> when the box restarted it looks like it was a power failure.
> 
> 

--