POSKeyErrors was Re: [Zope] Zope leaking memory?
On Tue, 2004-09-14 at 19:18, Richard Jones wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 15 Sep 2004 08:51 am, you wrote:
POSKeyErrors? Do you know which objects are going missing?
BTrees of some sort in every case.
Unfortunately, we make use of Transience outside of the regular SESSION stuff (for wizards which include file uploads).
Am still working through my daily morning POSKeyError cleanup :(
I haven't seen any reports in the collector of POSKeyError resulting from Transience usage. I presume the BTrees that go missing are transience-related? Even if so I'm not sure if that indicates anything is wrong with Transience itself. Lately, Tim has made some fixes to ZODB 3.2 (aka Zope 2.7) that cause an exception to be raised if a connection is closed while there are pending modifications in that connection. Also, another bug has been fixed to make "begin()" to also do an "abort()" even if there are only modifications pending in subtransactions within a transaction ( see http://zope.org/Collectors/Zope/789 http://article.gmane.org/gmane.comp.web.zope.zodb/5364 ). Additionally, Zope's publisher has been changed to be more explicit about transaction management (required by the first ZODB fix, although its old behavior would have left it solved by the second ;-). Aapparently the "abort subtransactions when beginning the main transaction" bug can be a source of POSKeyErrors due to connection-cache/database desynchronization. I have no idea if this is what is biting you but it's worth a shot to find out I suppose. All of these fixes have happened recently enough that they haven't yet made it to any released Zope version and are only available in CVS (on the Zope-2_7-branch). Unrelatedly, new work has also been happening in Transience that will be merged into the 2.7 branch soon as well. This is currently in good shape on the 'chrism-pre273-branch' of Transience in CVS. See http://www.plope.com/Members/chrism/sessioning_redux for the list of features and bugfixes that have been happening to Transience and friends recently. - C
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 15 Sep 2004 11:09 am, Chris McDonough wrote:
On Tue, 2004-09-14 at 19:18, Richard Jones wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 15 Sep 2004 08:51 am, you wrote:
POSKeyErrors? Do you know which objects are going missing?
BTrees of some sort in every case.
Unfortunately, we make use of Transience outside of the regular SESSION stuff (for wizards which include file uploads).
Am still working through my daily morning POSKeyError cleanup :(
I haven't seen any reports in the collector of POSKeyError resulting from Transience usage. I presume the BTrees that go missing are transience-related? Even if so I'm not sure if that indicates anything is wrong with Transience itself.
My POSKeyErrors all pop up in Transience. Also, my reading of the collector reports and related mailing list discussions seemed to indicate that sessions, and their Transience stores, were the culprits. I'm very much a newbie when it comes to this stuff - I've not had the time (not will I) to really dig into it or fully understand what's going wrong. Happy to be told I've got it all wrong, just as long as I can eventually (soon would be nice) figure out how to *fix* it all :)
Lately, Tim has made some fixes to ZODB 3.2 (aka Zope 2.7) that cause an exception to be raised if a connection is closed while there are pending modifications in that connection.
Yep, aware of that.
Also, another bug has been fixed to make "begin()" to also do an "abort()" even if there are only modifications pending in subtransactions within a transaction ( see http://zope.org/Collectors/Zope/789 http://article.gmane.org/gmane.comp.web.zope.zodb/5364 ).
And that.
Additionally, Zope's publisher has been changed to be more explicit about transaction management (required by the first ZODB fix, although its old behavior would have left it solved by the second ;-).
And that too ;)
Aapparently the "abort subtransactions when beginning the main transaction" bug can be a source of POSKeyErrors due to connection-cache/database desynchronization. I have no idea if this is what is biting you but it's worth a shot to find out I suppose.
All of these fixes have happened recently enough that they haven't yet made it to any released Zope version and are only available in CVS (on the Zope-2_7-branch).
I've just upgraded my development system to 2-7 branch but a separate issue relating to permissions has popped up. Have mailed zope-dev, and hopefully that can be resolved ASAP.
Unrelatedly, new work has also been happening in Transience that will be merged into the 2.7 branch soon as well. This is currently in good shape on the 'chrism-pre273-branch' of Transience in CVS.
Yes, been watching those checkins, though I understand none of them. I'm not quite desperate enough to run that branch though ;)
See http://www.plope.com/Members/chrism/sessioning_redux for the list of features and bugfixes that have been happening to Transience and friends recently.
I'll read up on them, yes. Thanks for the pointer. Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBR5jArGisBEHG6TARAqRAAJ0bcT1mYXPaX76e8DvKT3WLa6lWSQCfVsXH J0jUmt7UIqP0p6QEXuq38z4= =ZasM -----END PGP SIGNATURE-----
On Tue, 2004-09-14 at 21:20, Richard Jones wrote:
I haven't seen any reports in the collector of POSKeyError resulting from Transience usage. I presume the BTrees that go missing are transience-related? Even if so I'm not sure if that indicates anything is wrong with Transience itself.
My POSKeyErrors all pop up in Transience. Also, my reading of the collector reports and related mailing list discussions seemed to indicate that sessions, and their Transience stores, were the culprits.
I don't see a POSKeyError mentioned in any of the collector reports in conjunction with Transience. Plenty of other errors, yes, but not POSKeyError, so you win the "first with new symptom" prize. ;-)
I'm very much a newbie when it comes to this stuff - I've not had the time (not will I) to really dig into it or fully understand what's going wrong. Happy to be told I've got it all wrong, just as long as I can eventually (soon would be nice) figure out how to *fix* it all :)
I haven't see Transience run up against any POSKeyErrors and no one has provided a way to cause that situation so it's hard for me to take any action on that. A collector issue explaining the symptom would be a place to start, though. <a laundry list of ZODB and publisher fixes mentioned>
All of these fixes have happened recently enough that they haven't yet made it to any released Zope version and are only available in CVS (on the Zope-2_7-branch).
I've just upgraded my development system to 2-7 branch but a separate issue relating to permissions has popped up. Have mailed zope-dev, and hopefully that can be resolved ASAP.
Does that mean that you have or haven't experienced these POSKeyErrors with a site running recent Zope 2.7 branch code?
Unrelatedly, new work has also been happening in Transience that will be merged into the 2.7 branch soon as well. This is currently in good shape on the 'chrism-pre273-branch' of Transience in CVS.
Yes, been watching those checkins, though I understand none of them. I'm not quite desperate enough to run that branch though ;)
You won't have much of a choice soon, it's getting merged toute suite. ;-) - C
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 15 Sep 2004 11:30 am, Chris McDonough wrote:
On Tue, 2004-09-14 at 21:20, Richard Jones wrote:
I've just upgraded my development system to 2-7 branch but a separate issue relating to permissions has popped up. Have mailed zope-dev, and hopefully that can be resolved ASAP.
Does that mean that you have or haven't experienced these POSKeyErrors with a site running recent Zope 2.7 branch code?
No, it means I can't upgrade the production server to the 2.7 branch until the permissions stuff is resolved.
Unrelatedly, new work has also been happening in Transience that will be merged into the 2.7 branch soon as well. This is currently in good shape on the 'chrism-pre273-branch' of Transience in CVS.
Yes, been watching those checkins, though I understand none of them. I'm not quite desperate enough to run that branch though ;)
You won't have much of a choice soon, it's getting merged toute suite. ;-)
Excellent! Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBR6RNrGisBEHG6TARApygAJ9qO1TNeIMgDoXS4GL4GqiNi2swlQCdHh1Q GZtB5WCcMvPUkCGZCaQhLoE= =bFhH -----END PGP SIGNATURE-----
On Tue, 2004-09-14 at 22:09, Richard Jones wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 15 Sep 2004 11:30 am, Chris McDonough wrote:
On Tue, 2004-09-14 at 21:20, Richard Jones wrote:
I've just upgraded my development system to 2-7 branch but a separate issue relating to permissions has popped up. Have mailed zope-dev, and hopefully that can be resolved ASAP.
Does that mean that you have or haven't experienced these POSKeyErrors with a site running recent Zope 2.7 branch code?
No, it means I can't upgrade the production server to the 2.7 branch until the permissions stuff is resolved.
I'm afraid I still don't know whether that means you have or haven't seen the errors under the latest 2.7 branch. I'm going to assume it means the encouraging "I haven't yet seen the errors under the latest 2.7 branch", only softened by "I haven't yet really used the latest 2.7 branch". ;-) - C
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 15 Sep 2004 12:22 pm, Chris McDonough wrote:
I'm going to assume it means the encouraging "I haven't yet seen the errors under the latest 2.7 branch", only softened by "I haven't yet really used the latest 2.7 branch". ;-)
That is correct. Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBR6hYrGisBEHG6TARAqk0AJ0T0LWVu4sbzyavcLbj+xMCQTM6oQCggDli oDToJ/AzytqyCmDS3e8/ILg= =heze -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, 15 Sep 2004 12:22 pm, Chris McDonough wrote:
I'm afraid I still don't know whether that means you have or haven't seen the errors under the latest 2.7 branch.
Well, that settles it. The errors are alive and kickin' in the 2.7 branch too. Within *moments* of the server going back up, I got the first POSKeyError report. Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSQZ+rGisBEHG6TARAiJJAJkBfOR7auImsRSzSLKpZD3ffnDRLQCePWRZ 22f7lJ9wP0hA36PRhxAYvaQ= =rW3M -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 16 Sep 2004 01:20 pm, Richard Jones wrote:
On Wed, 15 Sep 2004 12:22 pm, Chris McDonough wrote:
I'm afraid I still don't know whether that means you have or haven't seen the errors under the latest 2.7 branch.
Well, that settles it. The errors are alive and kickin' in the 2.7 branch too. Within *moments* of the server going back up, I got the first POSKeyError report.
Make that the first of *many*. I've now backed-out the 2.7 branch and am running 2.7.2 again. Richard, weeping in the corner -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSQr0rGisBEHG6TARAuaUAKCF8ve27iBI9+5SBUj52L80/t59ggCcDARE k9aKi/yIeTjBoQSGq9mNRUs= =1H1i -----END PGP SIGNATURE-----
Disappointing. Can you please send a traceback for the PKE and any earlier errors that look like they may be related? Out of curiositity, how do you know the POSKeyErrors didn't happen before you switched to the new branch? - C On Wed, 2004-09-15 at 23:39, Richard Jones wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Thu, 16 Sep 2004 01:20 pm, Richard Jones wrote:
On Wed, 15 Sep 2004 12:22 pm, Chris McDonough wrote:
I'm afraid I still don't know whether that means you have or haven't seen the errors under the latest 2.7 branch.
Well, that settles it. The errors are alive and kickin' in the 2.7 branch too. Within *moments* of the server going back up, I got the first POSKeyError report.
Make that the first of *many*. I've now backed-out the 2.7 branch and am running 2.7.2 again.
Richard, weeping in the corner
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFBSQr0rGisBEHG6TARAuaUAKCF8ve27iBI9+5SBUj52L80/t59ggCcDARE k9aKi/yIeTjBoQSGq9mNRUs= =1H1i -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 16 Sep 2004 03:05 pm, Chris McDonough wrote:
Out of curiositity, how do you know the POSKeyErrors didn't happen before you switched to the new branch?
I need to do some more poking around. I *had* cleaned out the ZODB of all errors that I could find, but when I reverted back to 2.7.2 and a Data.fs backup, I had a lot of very similar errors again. I do know that I got a *lot* more error reports through from the Zopes. I panicked a little (ok, a lot). I'm tempted to give the upgrade another whirl to make sure my reaction was the appropriate one. Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSSJUrGisBEHG6TARApn1AJ4rBDEOk+IU8Y7oxxBPehSyyEL9dwCfcAIc TrLQjRZphBCon2mHl7p1DWk= =yht1 -----END PGP SIGNATURE-----
Did you (can you?) run the ZODB thru fsrefs.py by any chance before putting it into prod on the new branch... that should find all the dangling references. - C On Thu, 2004-09-16 at 01:19, Richard Jones wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Thu, 16 Sep 2004 03:05 pm, Chris McDonough wrote:
Out of curiositity, how do you know the POSKeyErrors didn't happen before you switched to the new branch?
I need to do some more poking around. I *had* cleaned out the ZODB of all errors that I could find, but when I reverted back to 2.7.2 and a Data.fs backup, I had a lot of very similar errors again.
I do know that I got a *lot* more error reports through from the Zopes. I panicked a little (ok, a lot). I'm tempted to give the upgrade another whirl to make sure my reaction was the appropriate one.
Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFBSSJUrGisBEHG6TARApn1AJ4rBDEOk+IU8Y7oxxBPehSyyEL9dwCfcAIc TrLQjRZphBCon2mHl7p1DWk= =yht1 -----END PGP SIGNATURE-----
[Chris McDonough]
Did you (can you?) run the ZODB thru fsrefs.py by any chance before putting it into prod on the new branch... that should find all the dangling references.
And use fsrefs.py from the 2.7 branch. Many improvements were made to fsrefs.py, and, in fact, some of them were due to frustrations I had trying to use the 2.7.2 fsrefs.py to analyze one of Richard's .fs files! In particular, "fsrefs.py -v" in 2.7.2 produced spurious tracebacks for objects whose creation had merely been undone. That's repaired on the 2.7 branch. The newer fsrefs.py also makes two passes, so you get a complete report of dangling references.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 16 Sep 2004 03:19 pm, Richard Jones wrote:
On Thu, 16 Sep 2004 03:05 pm, Chris McDonough wrote:
Out of curiositity, how do you know the POSKeyErrors didn't happen before you switched to the new branch?
I need to do some more poking around. I *had* cleaned out the ZODB of all errors that I could find, but when I reverted back to 2.7.2 and a Data.fs backup, I had a lot of very similar errors again.
I do know that I got a *lot* more error reports through from the Zopes. I panicked a little (ok, a lot). I'm tempted to give the upgrade another whirl to make sure my reaction was the appropriate one.
Well, there's reference errors in the ZODB that I can't fix(*), but I'm running the 2.7 CVS now and it *seems* stable. Not sure what happened before. * fsrefs still reports errors on some objects, like: oid 0x0265be BTrees.IOBTree.IOBucket last updated: 2004-09-16 02:32:47.973507, tid=0x357DBF8CCAFDCCCL refers to invalid object: oid 0x02b6c2 missing: 'BTrees.IOBTree.IOBucket' but when I dig in there, the IOBucket appears to just have strings as the values. And they're all present. Not sure why / how fsrefs thinks stuff is missing. Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSSrNrGisBEHG6TARAsNeAJ9PWQXVH+r4cbacCB17LU2O7ttdUACdFdHV 3D185MbXhFHitf8/frHqDSE= =X1kO -----END PGP SIGNATURE-----
[Richard Jones]
... * fsrefs still reports errors on some objects, like:
oid 0x0265be BTrees.IOBTree.IOBucket last updated: 2004-09-16 02:32:47.973507, tid=0x357DBF8CCAFDCCCL refers to invalid object: oid 0x02b6c2 missing: 'BTrees.IOBTree.IOBucket'
but when I dig in there, the IOBucket appears to just have strings as the values. And they're all present. Not sure why / how fsrefs thinks stuff is missing.
fsrefs says an oid is "missing" if and only if it the oid doesn't appear in the .fs.index file. It could be a good idea to delete your .fs.index file, in case it got of synch with your .fs file with all the switching back and forth. ZODB will recreate the .fs.index file from the .fs file then. It's not a good idea to have mismatching .fs and .fs.index files.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 16 Sep 2004 04:10 pm, Tim Peters wrote:
fsrefs says an oid is "missing" if and only if it the oid doesn't appear in the .fs.index file. It could be a good idea to delete your .fs.index file, in case it got of synch with your .fs file with all the switching back and forth.
Deleting the index had no impact. Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSTD9rGisBEHG6TARAl2IAJ0eR5I9fMH+ERaNrtuIO2dDfopT+QCeJ5yC 4N5GbrSpub32yfNnXPvt5ts= =CvEl -----END PGP SIGNATURE-----
[Tim Peters]
fsrefs says an oid is "missing" if and only if it the oid doesn't appear in the .fs.index file. It could be a good idea to delete your .fs.index file, in case it got of synch with your .fs file with all the switching back and forth.
[Richard Jones]
Deleting the index had no impact.
Then, in your example: oid 0x0265be BTrees.IOBTree.IOBucket last updated: 2004-09-16 02:32:47.973507, tid=0x357DBF8CCAFDCCCL refers to invalid object: oid 0x02b6c2 missing: 'BTrees.IOBTree.IOBucket' oid 0x02b6c2 is not in your .fs.index, and so an attempt to load oid 0x02b6c2 should cause a POSKeyError. When you said: but when I dig in there, the IOBucket appears to just have strings as the values. And they're all present it wasn't clear what "when I dig in there" meant. What specifically did you do to inspect oid 0x02b6c2? Or you were looking at oid 0x0265be? ("the IOBucket" was ambiguous, since two distinct IOBuckets are mentioned in the output).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 16 Sep 2004 04:31 pm, Tim Peters wrote:
[Richard Jones]
Deleting the index had no impact.
Then, in your example:
oid 0x0265be BTrees.IOBTree.IOBucket last updated: 2004-09-16 02:32:47.973507, tid=0x357DBF8CCAFDCCCL refers to invalid object: oid 0x02b6c2 missing: 'BTrees.IOBTree.IOBucket'
oid 0x02b6c2 is not in your .fs.index, and so an attempt to load oid 0x02b6c2 should cause a POSKeyError. When you said:
but when I dig in there, the IOBucket appears to just have strings as the values. And they're all present
it wasn't clear what "when I dig in there" meant. What specifically did you do to inspect oid 0x02b6c2? Or you were looking at oid 0x0265be? ("the IOBucket" was ambiguous, since two distinct IOBuckets are mentioned in the output).
Sorry, by "dig in there" I meant that I loaded up the object with oid 0x0265be using:
from Zope.Startup.run import configure;configure('zope-19100/zope.conf') from Zope import app; root = app() from ZODB.utils import p64 o = root._p_jar[p64(0x0265be)]
and then I had a poke at that:
for k,v in o.items(): ... print k, type(o[k]), o[k] ... 1531753053 <type 'str'> /CGPublisher/publishers/12/messages/17 1610364516 <type 'str'> /CGPublisher/works/171/messages/1 1610364517 <type 'str'> /CGPublisher/publishers/11/messages/31 1610364518 <type 'str'> /CGPublisher/publishers/11/messages/32 1610364519 <type 'str'> /CGPublisher/works/173/messages 1610364520 <type 'str'> /CGPublisher/publishers/11/messages/33 1610364521 <type 'str'> /CGPublisher/publishers/11/messages/34 1637779823 <type 'str'> /CGPublisher/publishers/11/messages/30 1655774688 <type 'str'> /CGPublisher/works/163/messages/4 1660892580 <type 'str'> /CGPublisher/publishers/11/messages/75 1660892581 <type 'str'> /CGPublisher/publishers/11/messages/76 1660892582 <type 'str'> /CGPublisher/publishers/11/messages/77 [snip many similar lines] 1701534533 <type 'str'> /CGPublisher/publishers/13/messages/63 1701534534 <type 'str'> /CGPublisher/publishers/13/messages/64 1701534535 <type 'str'> /CGPublisher/publishers/13/messages/65 1701534536 <type 'str'> /CGPublisher/publishers/13/messages/66 1701534537 <type 'str'> /CGPublisher/publishers/13/messages/67 1701534538 <type 'str'> /CGPublisher/publishers/13/messages/68 1701534539 <type 'str'> /CGPublisher/publishers/13/messages/69 1708905051 <type 'str'> /CGPublisher/works/170/messages 1716432762 <type 'str'> /CGPublisher/publishers/13/messages/68/2 1716432763 <type 'str'> /CGPublisher/works/183/messages
and just to confirm I'm not going mad:
root._p_jar[p64(0x02b6c2)] Traceback (most recent call last): File "<stdin>", line 1, in ? File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZODB/Connection.py", line 170, in __getitem__ File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZEO/ClientStorage.py", line 749, in load File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZEO/ServerStub.py", line 82, in zeoLoad File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZEO/zrpc/connection.py", line 372, in call ZODB.POSException.POSKeyError: 0x02b6c2
I guess one issue here is that I'm poking fsrefs.py directly at the Data.fs, whereas the above session is done through a ZEO connection. Not sure how ZEO could "hide" the erroroneous data from me, but then I don't know the inner workings of ZEO and its caches... -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSTTHrGisBEHG6TARAhpRAJ9fjHBmh+onuKJVfUGeCUBuR5ziSQCeNyv0 pF53zqqHHOdbYDHShzGKW0I= =gSl/ -----END PGP SIGNATURE-----
[Richard Jones]
Deleting the index had no impact.
[Tim Peters]
Then, in your example:
oid 0x0265be BTrees.IOBTree.IOBucket last updated: 2004-09-16 02:32:47.973507, tid=0x357DBF8CCAFDCCCL refers to invalid object: oid 0x02b6c2 missing: 'BTrees.IOBTree.IOBucket'
oid 0x02b6c2 is not in your .fs.index, and so an attempt to load oid 0x02b6c2 should cause a POSKeyError. When you said:
but when I dig in there, the IOBucket appears to just have strings as the values. And they're all present
it wasn't clear what "when I dig in there" meant. What specifically did you do to inspect oid 0x02b6c2? Or you were looking at oid 0x0265be? ("the IOBucket" was ambiguous, since two distinct IOBuckets are mentioned in the output).
[Richard]
Sorry, by "dig in there" I meant that I loaded up the object with oid 0x0265be using:
from Zope.Startup.run import configure;configure('zope-19100/zope.conf') from Zope import app; root = app() from ZODB.utils import p64 o = root._p_jar[p64(0x0265be)]
Thanks! That's clear. To make more sense of what you're seeing, you have to know that btrees are complicated data structures. While you see a single IOBTree B at the Python level, under the covers B is actually a graph made up of any number of IOBTree and IOBucket nodes, each a distinct persistent object. That's why btrees scale well. *Normally* you only see the topmost IOBTree node, but digging into the database by oid exposes the elaborate internal structure. That internal structure includes three distinct kinds of inter-node references that have nothing to do with the keys or values. Those inter-node references are part of the btree's state too, but you're normally not aware of them. While it would take deeper analysis to be sure (there's not enough info here to nail it), the evidence that is here suggests that oid 0x0265be is a leaf-level bucket that's an internal (normally unexposed) detail of some higher-level IOBTree. All the leaf-level buckets in a BTree are in a singly-linked list, to support efficient traversal from smallest key to largest. Each bucket has a "next bucket" pointer to support this. This isn't exposed in Python -- it's an internal detail of btree construction. So the evidence here suggests that oid 0x0265be has a next-bucket pointer to oid 0x02b6c2, but the latter object doesn't exist in the database.
and then I had a poke at that:
for k,v in o.items(): ... print k, type(o[k]), o[k] ...
Probably would have been easier to do print k, type(v), v at that point <wink>.
1531753053 <type 'str'> /CGPublisher/publishers/12/messages/17 1610364516 <type 'str'> /CGPublisher/works/171/messages/1 1610364517 <type 'str'> /CGPublisher/publishers/11/messages/31 1610364518 <type 'str'> /CGPublisher/publishers/11/messages/32 1610364519 <type 'str'> /CGPublisher/works/173/messages 1610364520 <type 'str'> /CGPublisher/publishers/11/messages/33 1610364521 <type 'str'> /CGPublisher/publishers/11/messages/34 1637779823 <type 'str'> /CGPublisher/publishers/11/messages/30 1655774688 <type 'str'> /CGPublisher/works/163/messages/4 1660892580 <type 'str'> /CGPublisher/publishers/11/messages/75 1660892581 <type 'str'> /CGPublisher/publishers/11/messages/76 1660892582 <type 'str'> /CGPublisher/publishers/11/messages/77 [snip many similar lines] 1701534533 <type 'str'> /CGPublisher/publishers/13/messages/63 1701534534 <type 'str'> /CGPublisher/publishers/13/messages/64 1701534535 <type 'str'> /CGPublisher/publishers/13/messages/65 1701534536 <type 'str'> /CGPublisher/publishers/13/messages/66 1701534537 <type 'str'> /CGPublisher/publishers/13/messages/67 1701534538 <type 'str'> /CGPublisher/publishers/13/messages/68 1701534539 <type 'str'> /CGPublisher/publishers/13/messages/69 1708905051 <type 'str'> /CGPublisher/works/170/messages 1716432762 <type 'str'> /CGPublisher/publishers/13/messages/68/2 1716432763 <type 'str'> /CGPublisher/works/183/messages
So the keys and values are fine. Traversing a bucket object makes no use of the next-bucket pointer, so a missing next-bucket object wouldn't cause any problems here.
and just to confirm I'm not going mad:
root._p_jar[p64(0x02b6c2)] Traceback (most recent call last): File "<stdin>", line 1, in ? File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZODB/Connection.py", line 170, in __getitem__ File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZEO/ClientStorage.py", line 749, in load File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZEO/ServerStub.py", line 82, in zeoLoad File "/opt/zope/cgpublisher-prod/Zope/lib/python/ZEO/zrpc/connection.py", line 372, in call ZODB.POSException.POSKeyError: 0x02b6c2
Which is consistent with fsrefs.py saying that oid 0x02b6c2 is "missing" -- it's not in the index, so trying to load it raises POSKeyError.
I guess one issue here is that I'm poking fsrefs.py directly at the Data.fs, whereas the above session is done through a ZEO connection. Not sure how ZEO could "hide" the erroroneous data from me, but then I don't know the inner workings of ZEO and its caches...
The info so far is self-consistent, so let's assume ZEO isn't a factor. There's no way to tell from what we have here what the "top level" btree may be. Trying to traverse the top-level btree would raise POSKeyError, when it got to the dangling next-bucket pointer. It's possible that running the checkbtrees.py tool would identify the bad top-level btree in a helpful way.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 16 Sep 2004 03:19 pm, Richard Jones wrote:
On Thu, 16 Sep 2004 03:05 pm, Chris McDonough wrote:
Out of curiositity, how do you know the POSKeyErrors didn't happen before you switched to the new branch?
I need to do some more poking around. I *had* cleaned out the ZODB of all errors that I could find, but when I reverted back to 2.7.2 and a Data.fs backup, I had a lot of very similar errors again.
I do know that I got a *lot* more error reports through from the Zopes. I panicked a little (ok, a lot). I'm tempted to give the upgrade another whirl to make sure my reaction was the appropriate one.
Well, when I did re-upgrade, it seems that everything's actually OK. No sign of the rush of POSKeyErrors that I got the first time - and for the first time in a long time, no new POSKeyErrors overnight!!! Yipee, and all that ;) Richard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSgbyrGisBEHG6TARAnmOAJ0aMVFuCNy78nCJYmi2c7vTFBPbHQCfbvXH WJQj/smEYsydqitTUcQ+NlU= =A0+m -----END PGP SIGNATURE-----
participants (3)
-
Chris McDonough -
Richard Jones -
Tim Peters