[Zope-dev] ERROR(200) ZODB Couldn't load state for... Memory problem?

Chris McDonough chrism at plope.com
Fri Jul 23 12:00:01 EDT 2004


Thanks a lot for the confirmation!

I haven't seen this problem crop up in several days myself, but that may
just be due to the fact that the ZEO cache hasn't flipped.  (Tim, yes,
when it does happen, it only happens once; then the server is restarted
by my customer to get the site going again, so it's a bit difficult to
tell what's happening, although I am trying to "catch" it in real time).

I'd like to chalk this up to improper permission settings; I just need
to figure out how to do that in my case because I start Zope from an rc
script as the "zope" user and my customer and myself only log in and
manage Zope-related processes as the zope user.  In fact, to avoid this
sort of problem, I actually have a Zope startup regiment that prevents
it from possibly even beginning to start as the root user or another
user via a wrapper script.  Unless someone is doing something behind my
back, which is of course possible.

Here's a demonstrative traceback:

2004-07-13T12:01:29 INFO(0) ZEC:1-None-1 flipping cache files.  new
current = 1
------
2004-07-13T12:01:29 PANIC(300) ZODB A storage error occurred in the last
phase of a two-phase commit.  This shouldn't happen. The application
will not be allowed to commit until the site/storage is reset by a
restart.
Traceback (most recent call last):
  File
"/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZODB/Transaction.py",
line 372, in _finish_many
    jar.tpc_finish(self)
  File
"/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZODB/Connection.py", line
763, in tpc_finish
    self._storage.tpc_finish(transaction, callback)
  File
"/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZEO/ClientStorage.py",
line 927, in tpc_finish
    self._update_cache()
  File
"/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZEO/ClientStorage.py",
line 953, in _update_cache
    self._cache.checkSize(self._tbuf.get_size())
  File
"/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZEO/ClientCache.py", line
581, in checkSize
    self._f[current] = open(self._p[current],'w+b')
IOError: [Errno 13] Permission denied: '1-None-1'

One minor thing I noticed: even when a ZEO client is started with
settings that should prevent it from using persistent cache files
(namely, failing to set a 'zeo-client-name' in zope.conf), the logfile
still reports:

2004-07-23T04:02:18 INFO(0) ZEC:1-None-0 ClientCache: storage='1',
size=100000000; file[0]='1-None-0'

This message is a bit puzzling when you're running without persistent
cache files; it makes you think that a file named "1-None-0" is
literally trying to be written.  This is not the case (it actually goes
into a tempfile somewhere).  I can imagine a case where some code... 
Aha.  Wait, I think I see what the problem is.  It's related to this.

The ZEO ClientCache.checkSize code assumes that if there is a non-None
value in self._f[current] that you're running with persistent cache
files.  However, this can also be the case if you're running without
persistent cache files; the __init__ method of ClientCache does this in
that circumstance:

      self._f = f = [tempfile.TemporaryFile(suffix='.zec'), None]
      # If they are temp files just use illustrative names so
      # that the log messages describe the cache.
      self._p = p = ["%s-%s-0" % (storage, client),
                     "%s-%s-1" % (storage, client)]
      f[0].write(magic + '\0' * (headersize - len(magic)))
      current = 0

(and self._current is set to local current later)

So the code in checkSize that tries this when a cache flip is necessary:

      current = not self._current

      ...

      if self._p[current] is not None:
          # Persistent cache file: remove the old file
          # before opening the new one, because the old file
          # may be owned by root (created before setuid()).

          self._f[current].close()
          try:
              os.remove(self._p[current])
          except:
              pass
          self._f[current] = open(self._p[current],'w+b')

... will be likely to fail at the last line if you're using
nonpersistent cache files, because self._p[current] is (bogus)
'1-None-0' (relative bogus filename).

There should probably be a _using_persistent_cache flag attr rather than
trying to inspect self._p to find out if we're using persistent caches.

I may try to work up a patch + test for this later.

On Fri, 2004-07-23 at 10:44, Russ Ferriday wrote:
> Thanks Tim and Chris! - You were both dead on in so many respects.
> 
> Chris, There was a correlation with flip in the logs
> 
> Tim: looks like once a None occurs it sticks.
> 
> Source of the problem was an old cache file (1-None-1) that could
> notbe overwritten by the current uid running Zope. (That's the
> secondstupid permissions problem in as many days. My advice - if you
> haveany kind of cache problems occurring after an initial period,
> lookhard at permissions. Secondly - don't be sloppy and run zope first
> asone user then as another. Thirdly grep the event logs for
> IOErrorfirst. That's the easiest thing to fix.) 
> 
> more comments below...
> 
> On 23 Jul 2004, at 02:14, Tim Peters wrote:
> 
>         [Chris McDonough]
>                 ...
>                 I think the problem is related to ZEO client storage
>                 "cache flips".
>         
>         ..
>         anyway, pointing to None.  But this code gives me a headache,
>         and I'm
>         not sure that can actually happen (despite that I hear you
>         guys saying
>         it is <wink>).
> I was slow interpreting all of this due to separate eventfiles.
> This piece of bash script was helpful to get a grip of time
> orderbetween Clients:
>     
>      grep ZEC ev*.log | sed 's/:/ /' | sort -k 2
> >interleaved.ZEC.events.log
> 
> a multi-line merge script would be helpful. I'll bet someone's
> createdone.
> 
>                 But then I turned off persistent ZEO client cachefile
>                 storage(but omitting
>                 the "zeo-client-name" parameter from zope.conf),
>                 believing this wouldbe a
>                 workaround, but it hasn't been.  I gave up at that
>                 point and that'swhere I am now.
>         
>         Did you continue to get errors in the log near cache-flip
>         times?  I
>         don't see a way for checkSize() to screw up unless an
>         unexpected
>         exception is raised.
> Bingo!
> 2004-07-21T11:35:07 INFO(0) ZEC:1-None-1 flipping cache files. 
> newcurrent = 1
> ------
> 2004-07-21T11:35:07 ERROR(200) ZODB Couldn't load state
> for0000000000012323
> Traceback (most recent call last):
>   File "/usr/local/zope/zope271/lib/python/ZODB/Connection.py",
> line559, in setstate
>     p, serial = self._storage.load(oid, self._version)
>   File "/usr/local/zope/zope271/lib/python/ZEO/ClientStorage.py",
> line754, in load
>     self._cache.checkSize(0)
>   File "/usr/local/zope/zope271/lib/python/ZEO/ClientCache.py",
> line581, in checkSize
>     self._f[current] = open(self._p[current],'w+b')
> IOError: [Errno 13] Permission denied: '1-None-1'
> 
>                 ...
>                 My theory is that it will happen as often as a Zope
>                 client's ZEOclient storage
>                 needs to flip its cache file.  The cache file is only
>                 flipped when itexceeds a
>                 certain size and it only exceeds a certain size after
>                 a certainpattern of usage
>                 causes it to do so (lots of loads from the database of
>                 new items,typically).
> Yup!
> 
>         It appears that once self._f[self._current] is None, all
>         future
>         attempts by ZEO to store into its client cache will fail the
>         same way.
>         So I'd be even more surprised if you saw just one of these
>         occur.
> Yup!
> 
>                 It would be nice if you could confirm this.  Reading
>                 the Zopeevent log
>                 file of the client that generated the error would be a
>                 good start.
>         
>         The log is everything here.  The ZEO client cache logs most
>         relevant
>         messages at info level, producing msgs starting with "ZEC".
>         
> 
> Thanks again for the insights.
> 
> --r.
> —————————————————————
> Russ Ferriday
> Solution Workshops for Plone
> (+44) (0) 7789 338868
> http://www.solutionworkshops.com



More information about the Zope-Dev mailing list