POSKeyError II: Dead By Dawn
They're baaaaaaack.... after a recent thread on Zope "Goblins" I decided to try the fstest.py script and see what it had to tell me. Sadly it found errors in my Data.fs on our development server. Lots of work has been done in that server recently; worried about losing data, I copied the Data.fs (plain old unix cp command) and ran fsrecover.py on the copy to see what it finds. I tee'd the output of fsrecover.py (stdout & stderr) to a file and I'm glad I did, because I've never seen anything this bad. Grepping through the fsrecover output reveals: 12527 total occurrences of "Error" 11397 "bad transaction length" errors 719 "invalid transaction length" errors 193 "POSKeyError" 75 *unique* POSKeyErrors (filtered through uniq) (so it's not just one bad oid) Ow. Ow. Ow. This is very odd considering that we have not noticed any problems at all from this zope installation! Oh yeah - specs: Zope 2.5.1 Python 2.1.3 built from source w/ largefile support ZEO RedHat Linux 2.4.7-10 SMP ZODB is over 2 GB, but the large size is not the problem - it was up to 2.5 at one point, and we haven't changed Python or the kernel since then, so it should be OK. (double-checking ... yes, test_largefile.py runs fine.) The really fun thing about POSKeyErrors in particular is that I know of no way to find out what zope id the broken oid once belonged to. Is there any way? Tried starting up with the recovered Data.fs and discovered that it seems "fine" except that most of our recent work is gone. :( I tried mounting a copy of the backup with another Zope instance, exporting the recent stuff, and importing it into the zope running the "repaired" Data.fs, but every time I tried to do that I got POSKeyErrors. (!) The repaired Data.fs passes fstest.py with no errors, so I think somehow the exports from the backup bring the POSKeyErrors with them. It's like a virus!!! (well, not really. the import just fails... the repaired Data.fs still passes fstest.py.) At this point I think I'm going to go back to the backup, errors and all... it may have tons of errors in it, but it has all of our work and seems to behave OK. :-\ Hmmmmm, I notice DirectoryStorage is now at 1.0.0. Hmmmmmmmmmmmmm. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's FILTHY MIME! (courtesy of isometric.spaceninja.com)
Forgot to mention, the things I'm trying to rescue are all within a CMFDefault instance. The objects I tried exporting / importing were Portal Folders. cmf is from the CVS HEAD around Nov 14th. wonder if this is related to my errors. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's RADIO DARK PATIENT! (courtesy of isometric.spaceninja.com)
Paul Winkler wrote:
They're baaaaaaack....
after a recent thread on Zope "Goblins" I decided to try the fstest.py script and see what it had to tell me. Sadly it found errors in my Data.fs on our development server. Lots of work has been done in that server recently; worried about losing data, I copied the Data.fs (plain old unix cp command) and ran fsrecover.py on the copy to see what it finds.
I tee'd the output of fsrecover.py (stdout & stderr) to a file and I'm glad I did, because I've never seen anything this bad.
I had similar results with a twist: I'm currently using zope 2.6.1b1, so I run the fstest.py that comes with it. It says: 4482765 object serialno 0x0348f97b7e5d184c does not matchtransaction id 0x034924f770139433 So I run 2.6.1b1 fsrecover.py and .... it wipes all the data from Data.fs (original Data.fs 20M, new Data.fs 4 *bytes*). Retested with the corresponding utilities from 2.6.0 and now fsrecover seems to recover all the data (I say "seems" since I didn't test yet the resulting Data.fs). Everything seems to be working fine with the "corrupted" Data.fs. Should I worry? Bye -- Luca Olivetti Wetron Automatización S.A. http://www.wetron.es/ Tel. +34 93 5883004 Fax +34 93 5883007
Luca, Paul... I'd suggest posting your results to zodb-dev@zope.org where the ZODb gods live and may be able to help... cheers, Chris
On Tuesday 21 January 2003 7:58 am, Paul Winkler wrote:
The really fun thing about POSKeyErrors in particular is that I know of no way to find out what zope id the broken oid once belonged to. Is there any way?
It shouldnt be hard to change fsrefs to dump the oid reference chain that leads to the dangling reference. DirectoryStorage 1.0 will do that if it finds a dangling reference during packing - for 1.1 I plan to move that feature into checkds where it belongs. However an oid reference chain wont tell you the application-level id of the object, but you can deduce that by looking at pickles.
I tried mounting a copy of the backup with another Zope instance, exporting the recent stuff, and importing it into the zope running the "repaired" Data.fs, but every time I tried to do that I got POSKeyErrors. (!)
Yes. The export procedure silently ignores any POSKeyErrors. Importing that export will create a nice clean dangling reference. Thats a feature, not a bug.
The repaired Data.fs passes fstest.py with no errors, so I think somehow the exports from the backup bring the POSKeyErrors with them. It's like a virus!!!
You imported some dangling references. -- Toby Dickenson http://www.geminidataloggers.com/people/tdickenson
On Tue, Jan 21, 2003 at 12:06:50PM +0000, Toby Dickenson wrote:
On Tuesday 21 January 2003 7:58 am, Paul Winkler wrote:
The really fun thing about POSKeyErrors in particular is that I know of no way to find out what zope id the broken oid once belonged to. Is there any way?
It shouldnt be hard to change fsrefs to dump the oid reference chain that leads to the dangling reference. DirectoryStorage 1.0 will do that if it finds a dangling reference during packing - for 1.1 I plan to move that feature into checkds where it belongs.
Yes, I am hoping to move to DirectoryStorage "as soon as I have time". For FileStorage, I guess I'm hosed.
However an oid reference chain wont tell you the application-level id of the object, but you can deduce that by looking at pickles.
better than nothing I guess...
I tried mounting a copy of the backup with another Zope instance, exporting the recent stuff, and importing it into the zope running the "repaired" Data.fs, but every time I tried to do that I got POSKeyErrors. (!)
Yes. The export procedure silently ignores any POSKeyErrors. Importing that export will create a nice clean dangling reference. Thats a feature, not a bug.
Ah, OK, thanks for the explanation. Guess I have to hit the zodb-dev archives and see what I can learn about OIDs and dangling references etc... -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's MOLE RAT SEMI- FEDORA! (courtesy of isometric.spaceninja.com)
Problem solved. I ran python2.1 fsrecover.py -P 0 -p Data.fs.borken Data.fs.recovered That way, I lose all Undo data, but very little "real stuff" which AFAICT was all only in old revisions anyway. (At least, various searches by date did not turn up anything unexpected, compared to another zope instance running against the Data.fs.borken.) -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's BARKEEP MEGA WENCH! (random hero from isometric.spaceninja.com)
I took a look through the mailing list and there is a lot of talk of POSKeyError but I have not found one solution. Anyone have something to go on? A fix? A place to start? BZ Mine: Error Type: POSKeyError Error Value: 000000000001efcd Traceback (innermost last): File /usr/local/zope/lib/python/ZPublisher/Publish.py, line 150, in publish_module File /usr/local/zope/lib/python/ZPublisher/Publish.py, line 114, in publish File /usr/local/zope/lib/python/Zope/__init__.py, line 159, in zpublisher_exception_hook (Object: Members) File /usr/local/zope/lib/python/ZPublisher/Publish.py, line 98, in publish File /usr/local/zope/lib/python/ZPublisher/mapply.py, line 88, in mapply (Object: index_html) File /usr/local/zope/lib/python/ZPublisher/Publish.py, line 39, in call_object (Object: index_html) File /usr/local/zope/lib/python/OFS/DTMLMethod.py, line 127, in __call__ (Object: index_html) File /usr/local/zope/lib/python/DocumentTemplate/DT_String.py, line 473, in __call__ (Object: index_html) File /usr/local/zope/lib/python/DocumentTemplate/DT_Return.py, line 31, in render (Object: roster) File /usr/local/zope/lib/python/OFS/DTMLMethod.py, line 120, in __call__ (Object: roster) File /usr/local/zope/lib/python/DocumentTemplate/DT_String.py, line 473, in __call__ (Object: roster) File /usr/local/zope/lib/python/DocumentTemplate/DT_Let.py, line 75, in render (Object: roster="portal_membership.getRoster()" userMgr="portal_membership.checkPermission( 'View', Members )") File /usr/local/zope/lib/python/DocumentTemplate/DT_Util.py, line 159, in eval (Object: portal_membership.getRoster()) (Info: portal_membership) File <string>, line 2, in f File /usr/local/zope/lib/python/Products/CMFDefault/MembershipTool.py, line 141, in getRoster (Object: portal_membership) File /usr/local/zope/lib/python/ZODB/Connection.py, line 447, in setstate File /usr/local/zope/lib/python/ZODB/FileStorage.py, line 582, in load (Object: /usr/local/zope/var/Data.fs) File /usr/local/zope/lib/python/ZODB/FileStorage.py, line 577, in _load (Object: /usr/local/zope/var/Data.fs) File /usr/local/zope/lib/python/ZODB/FileStorage.py, line 2134, in _loadBack POSKeyError: (see above)
BZ wrote:
I took a look through the mailing list and there is a lot of talk of POSKeyError but I have not found one solution.
Anyone have something to go on? A fix? A place to start?
What version of Zope / ZEo are you using? Might be best to ask on zodb-dev@zope.org... cheers, Chris
Zope Version (Zope 2.5.1 (binary release, python 2.1, linux2-x86), python 2.1.3, linux2) Python Version 2.1.3 (#1, Apr 15 2002, 11:13:45) [GCC 2.7.2.3] I got some answers from another thread, but nobody seems to really have a cure. I tried using fsrecover.py but was having problems with that as well. BZ
BZ wrote:
I took a look through the mailing list and there is a lot of talk of POSKeyError but I have not found one solution.
Anyone have something to go on? A fix? A place to start?
What version of Zope / ZEo are you using?
Might be best to ask on zodb-dev@zope.org...
cheers,
Chris
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Hey Chris, I will give 2.6.1 a shot. Are you saying that I should not get any NEW errors or that THIS error would be fixed? I have tried a ton of things and not been able to isolate or delete this problem. BZ
BZ wrote:
Zope Version (Zope 2.5.1 (binary release, python 2.1, linux2-x86), python 2.1.3, linux2) Python Version 2.1.3 (#1, Apr 15 2002, 11:13:45) [GCC 2.7.2.3]
Moving to 2.6.1 will likely solve this...
cheers,
Chris
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
I moved it all over to Zope2.6.1 Here is the error: Exception traceback Time 2003/03/18 07:02:37.499 US/Central User Name (User Id) Request URL http://66.66.66.66./manage_findResult Exception Type POSKeyError Exception Value 000000000001efce Traceback (innermost last): * Module ZPublisher.Publish, line 98, in publish * Module ZPublisher.mapply, line 88, in mapply * Module ZPublisher.Publish, line 39, in call_object * Module Shared.DC.Scripts.Bindings, line 252, in __call__ * Module Shared.DC.Scripts.Bindings, line 283, in _bindAndExec * Module App.special_dtml, line 174, in _exec * Module DocumentTemplate.DT_With, line 76, in render * Module DocumentTemplate.DT_In, line 465, in renderwb * Module DocumentTemplate.DT_In, line 773, in sort_sequence * Module ZODB.Connection, line 509, in setstate * Module ZODB.FileStorage, line 650, in load * Module ZODB.FileStorage, line 624, in _load POSKeyError: 000000000001efce Display traceback as text REQUEST form btn_submit 'Find' search_sub 1 obj_searchterm '' obj_mtime '' obj_ids [] searchtype 'simple' obj_metatypes ['all'] obj_mspec '<' cookies tree-s 'eJyLjlZ3hANPW/XYWAAtiQTP' user_email 'asdf' dtpref_cols '100%' dtpref_rows '40' user_name 'asdf' lazy items SESSION <bound method SessionDataManager.getSessionData of <SessionDataManager instance at 8da00d8>> other BASEPATH1 '' PUBLISHED <DTMLFile instance at 8294b60> obj_roles None obj_mtime '' batch_size 20 a_ 6 BASE2 'http://66.66.66.66/manage_findResult' BASE1 'http://6 66.66.66.66' traverse_subpath [] URL1 'http://66.66.66.66' SERVER_URL 'http://66.66.66.66' skey 'id' AUTHENTICATED_USER TraversalRequestNameStack [] obj_metatypes ['all'] AUTHENTICATION_PATH '' obj_expr None obj_permission None management_page_charset 'iso-8859-1' obj_ids [] rkey '' obj_searchterm '' search_sub 1 URL 'http://66.66.66.66/manage_findResult' searchtype 'simple' n_ 6 obj_mspec '<' btn_submit 'Find' URL0 http://66.66.66.66/manage_findResult URL1 http://6 66.66.66.66 BASE0 http://6 66.66.66.66 BASE1 http://66.66.66.66 BASE2 http://6 66.66.66.66 0/manage_findResult environ SCRIPT_NAME '' HTTP_ACCEPT_ENCODING 'gzip,deflate,compress;q=0.9' SERVER_PORT '8080' PATH_TRANSLATED '/manage_findResult' HTTP_ACCEPT 'text/xml,application/xml,application/xhtml+xml,text/ html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/ gif;q=0.2,*/*;q=0.1' GATEWAY_INTERFACE 'CGI/1.1' HTTP_COOKIE 'dtpref_rows="40"; dtpref_cols="100%"; user_email="asdf"; user_name="asdf"; tree-s="eJyLjlZ3hANPW/XYWAAtiQTP"' HTTP_ACCEPT_LANGUAGE 'en-us,en;q=0.5' REMOTE_ADDR '209.244.242.2' SERVER_NAME '66.216.105.50' HTTP_USER_AGENT 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.3) Gecko/20030312' HTTP_ACCEPT_CHARSET 'ISO-8859-1,utf-8;q=0.7,*;q=0.7' CONNECTION_TYPE 'keep-alive' channel.creation_time 1047991717 QUERY_STRING 'searchtype=simple&obj_metatypes%3Alist=all&obj_ids%3Atokens=&obj_search term=&obj_mspec=%3C&obj_mtime=&search_sub%3Aint=1&btn_submit=Find' SERVER_PROTOCOL 'HTTP/1.1' HTTP_KEEP_ALIVE '300' HTTP_HOST '66.216.105.50:8080' REQUEST_METHOD 'GET' PATH_INFO '/manage_findResult' SERVER_SOFTWARE 'Zope/(Zope 2.6.1 (binary release, python 2.1, linux2-x86), python 2.1.3, linux2) ZServer/1.1b1' HTTP_REFERER 'http://66.66.66.66/manage_findForm' On Monday, March 17, 2003, at 04:00 AM, Chris Withers wrote:
BZ wrote:
Zope Version (Zope 2.5.1 (binary release, python 2.1, linux2-x86), python 2.1.3, linux2) Python Version 2.1.3 (#1, Apr 15 2002, 11:13:45) [GCC 2.7.2.3]
Moving to 2.6.1 will likely solve this...
cheers,
Chris
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
BZ wrote:
I moved it all over to Zope2.6.1
Here is the error:
Exception traceback Time 2003/03/18 07:02:37.499 US/Central User Name (User Id) Request URL http://66.66.66.66./manage_findResult Exception Type POSKeyError Exception Value 000000000001efce
Sounds like you have soem genuine ZODB corruption. Mindyou, I see it happens when you're searching a ZCatalog. Try re-indexing everythign and see if it's still there... cheers, Chris
participants (5)
-
BZ -
Chris Withers -
Luca Olivetti -
Paul Winkler -
Toby Dickenson