Thanks a lot for the confirmation! I haven't seen this problem crop up in several days myself, but that may just be due to the fact that the ZEO cache hasn't flipped. (Tim, yes, when it does happen, it only happens once; then the server is restarted by my customer to get the site going again, so it's a bit difficult to tell what's happening, although I am trying to "catch" it in real time). I'd like to chalk this up to improper permission settings; I just need to figure out how to do that in my case because I start Zope from an rc script as the "zope" user and my customer and myself only log in and manage Zope-related processes as the zope user. In fact, to avoid this sort of problem, I actually have a Zope startup regiment that prevents it from possibly even beginning to start as the root user or another user via a wrapper script. Unless someone is doing something behind my back, which is of course possible. Here's a demonstrative traceback: 2004-07-13T12:01:29 INFO(0) ZEC:1-None-1 flipping cache files. new current = 1 ------ 2004-07-13T12:01:29 PANIC(300) ZODB A storage error occurred in the last phase of a two-phase commit. This shouldn't happen. The application will not be allowed to commit until the site/storage is reset by a restart. Traceback (most recent call last): File "/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZODB/Transaction.py", line 372, in _finish_many jar.tpc_finish(self) File "/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZODB/Connection.py", line 763, in tpc_finish self._storage.tpc_finish(transaction, callback) File "/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZEO/ClientStorage.py", line 927, in tpc_finish self._update_cache() File "/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZEO/ClientStorage.py", line 953, in _update_cache self._cache.checkSize(self._tbuf.get_size()) File "/home/zope/dhportal/opt/Zope-2.7.1/lib/python/ZEO/ClientCache.py", line 581, in checkSize self._f[current] = open(self._p[current],'w+b') IOError: [Errno 13] Permission denied: '1-None-1' One minor thing I noticed: even when a ZEO client is started with settings that should prevent it from using persistent cache files (namely, failing to set a 'zeo-client-name' in zope.conf), the logfile still reports: 2004-07-23T04:02:18 INFO(0) ZEC:1-None-0 ClientCache: storage='1', size=100000000; file[0]='1-None-0' This message is a bit puzzling when you're running without persistent cache files; it makes you think that a file named "1-None-0" is literally trying to be written. This is not the case (it actually goes into a tempfile somewhere). I can imagine a case where some code... Aha. Wait, I think I see what the problem is. It's related to this. The ZEO ClientCache.checkSize code assumes that if there is a non-None value in self._f[current] that you're running with persistent cache files. However, this can also be the case if you're running without persistent cache files; the __init__ method of ClientCache does this in that circumstance: self._f = f = [tempfile.TemporaryFile(suffix='.zec'), None] # If they are temp files just use illustrative names so # that the log messages describe the cache. self._p = p = ["%s-%s-0" % (storage, client), "%s-%s-1" % (storage, client)] f[0].write(magic + '\0' * (headersize - len(magic))) current = 0 (and self._current is set to local current later) So the code in checkSize that tries this when a cache flip is necessary: current = not self._current ... if self._p[current] is not None: # Persistent cache file: remove the old file # before opening the new one, because the old file # may be owned by root (created before setuid()). self._f[current].close() try: os.remove(self._p[current]) except: pass self._f[current] = open(self._p[current],'w+b') ... will be likely to fail at the last line if you're using nonpersistent cache files, because self._p[current] is (bogus) '1-None-0' (relative bogus filename). There should probably be a _using_persistent_cache flag attr rather than trying to inspect self._p to find out if we're using persistent caches. I may try to work up a patch + test for this later. On Fri, 2004-07-23 at 10:44, Russ Ferriday wrote:
Thanks Tim and Chris! - You were both dead on in so many respects.
Chris, There was a correlation with flip in the logs
Tim: looks like once a None occurs it sticks.
Source of the problem was an old cache file (1-None-1) that could notbe overwritten by the current uid running Zope. (That's the secondstupid permissions problem in as many days. My advice - if you haveany kind of cache problems occurring after an initial period, lookhard at permissions. Secondly - don't be sloppy and run zope first asone user then as another. Thirdly grep the event logs for IOErrorfirst. That's the easiest thing to fix.)
more comments below...
On 23 Jul 2004, at 02:14, Tim Peters wrote:
[Chris McDonough] ... I think the problem is related to ZEO client storage "cache flips".
.. anyway, pointing to None. But this code gives me a headache, and I'm not sure that can actually happen (despite that I hear you guys saying it is <wink>). I was slow interpreting all of this due to separate eventfiles. This piece of bash script was helpful to get a grip of time orderbetween Clients:
grep ZEC ev*.log | sed 's/:/ /' | sort -k 2
interleaved.ZEC.events.log
a multi-line merge script would be helpful. I'll bet someone's createdone.
But then I turned off persistent ZEO client cachefile storage(but omitting the "zeo-client-name" parameter from zope.conf), believing this wouldbe a workaround, but it hasn't been. I gave up at that point and that'swhere I am now.
Did you continue to get errors in the log near cache-flip times? I don't see a way for checkSize() to screw up unless an unexpected exception is raised. Bingo! 2004-07-21T11:35:07 INFO(0) ZEC:1-None-1 flipping cache files. newcurrent = 1 ------ 2004-07-21T11:35:07 ERROR(200) ZODB Couldn't load state for0000000000012323 Traceback (most recent call last): File "/usr/local/zope/zope271/lib/python/ZODB/Connection.py", line559, in setstate p, serial = self._storage.load(oid, self._version) File "/usr/local/zope/zope271/lib/python/ZEO/ClientStorage.py", line754, in load self._cache.checkSize(0) File "/usr/local/zope/zope271/lib/python/ZEO/ClientCache.py", line581, in checkSize self._f[current] = open(self._p[current],'w+b') IOError: [Errno 13] Permission denied: '1-None-1'
... My theory is that it will happen as often as a Zope client's ZEOclient storage needs to flip its cache file. The cache file is only flipped when itexceeds a certain size and it only exceeds a certain size after a certainpattern of usage causes it to do so (lots of loads from the database of new items,typically). Yup!
It appears that once self._f[self._current] is None, all future attempts by ZEO to store into its client cache will fail the same way. So I'd be even more surprised if you saw just one of these occur. Yup!
It would be nice if you could confirm this. Reading the Zopeevent log file of the client that generated the error would be a good start.
The log is everything here. The ZEO client cache logs most relevant messages at info level, producing msgs starting with "ZEC".
Thanks again for the insights.
--r. ————————————————————— Russ Ferriday Solution Workshops for Plone (+44) (0) 7789 338868 http://www.solutionworkshops.com