This past weekend I migrated a bunch of ZClasses and the web site that uses them from a scratch Zope instance into my production instance. Both run off the same 2.6 CVS branch checkout, connecting to ZEO servers running from the same ZEO trunk checkout. For a while after starting the server, everything works fine. After an unpredictable time period ranging from minutes to hours, suddenly my ZClass instances become broken, and I start seeing this in my log, over and over: 2002-12-10T11:33:18 ERROR(200) ZODB Couldn't load state for '\x00\x00\x00\x00\x00\x01\x1f\xc9' Traceback (innermost last): Module ZODB.Connection, line 533, in setstate ImportError: No module named */rJ6Q2CNqMJDSN4Okpwiuw== The object that fails to load state is a BTree Bucket belonging to the ZClass registry, and "/rJ6Q2CNqMJDSN4Okpwiuw==" is the GUID of one of my ZClasses. If I delete that ZClass, it complains about a different one. If I restart the server, everything works again, for a while, but I get the following in my log: 2002-12-10T11:36:49 INFO(0) ZCS:2443 Connected to storage ------ 2002-12-10T11:36:49 INFO(0) ZODB Opening database for mounting: '144865536_1012175858.273945' ------ 2002-12-10T11:36:49 INFO(0) ZODB Mounted database '144865536_1012175858.273945' at /temp_folder ------ 2002-12-10T11:36:50 ERROR(200) ZODB Couldn't load state for '\x00\x00\x00\x00\x00\x01\x1f\xc9' Traceback (innermost last): Module ZODB.Connection, line 533, in setstate ImportError: No module named */rJ6Q2CNqMJDSN4Okpwiuw== ------ 2002-12-10T11:36:50 ERROR(200) Zope A problem was found when checking the global product registry. This is probably due to a Product being uninstalled or renamed. The traceback follows. Traceback (innermost last): Module OFS.Application, line 218, in checkGlobalRegistry Module ZODB.Connection, line 533, in setstate ImportError: No module named */rJ6Q2CNqMJDSN4Okpwiuw== ------ 2002-12-10T11:36:50 INFO(0) Zope Beginning attempt to rebuild the global ZClass registry. ------ 2002-12-10T11:36:52 INFO(0) Zope The global ZClass registry has successfully been rebuilt. ------ 2002-12-10T11:36:52 INFO(0) ZServer HTTP server started at Tue Dec 10 11:36:52 2002 Hostname: server1.shackworks.com Port: 8280 Help!
On 12/10/2002 12:53 PM, Evan Simpson wrote:
This past weekend I migrated a bunch of ZClasses and the web site that uses them from a scratch Zope instance into my production instance. Both run off the same 2.6 CVS branch checkout, connecting to ZEO servers running from the same ZEO trunk checkout.
For a while after starting the server, everything works fine. After an unpredictable time period ranging from minutes to hours, suddenly my ZClass instances become broken, and I start seeing this in my log, over and over:
2002-12-10T11:33:18 ERROR(200) ZODB Couldn't load state for '\x00\x00\x00\x00\x00\x01\x1f\xc9' Traceback (innermost last): Module ZODB.Connection, line 533, in setstate ImportError: No module named */rJ6Q2CNqMJDSN4Okpwiuw==
The object that fails to load state is a BTree Bucket belonging to the ZClass registry, and "/rJ6Q2CNqMJDSN4Okpwiuw==" is the GUID of one of my ZClasses. If I delete that ZClass, it complains about a different one.
My first recommendation would be to turn your ZClass registry into an OOBTree. OOBTree has been maintained and updated. Make a backup first :-) then from a debugging session: import Zope from BTrees.OOBTree import OOBTree app = Zope.app() root = app._p_jar.root() new_tree = OOBTree() for k, v in root['ZGlobals'].items(): new_tree[k] = v root['ZGlobals'] = new_tree get_transaction().commit() This won't necessarily solve it, but it might, since no one really knows the state of the old BTree module. I was surprised to discover only a few days ago that we never converted the code to create an OOBTree registry. (If this change solves your problem, I should probably fix OFS/Application.py.) Shane
Shane Hathaway wrote:
My first recommendation would be to turn your ZClass registry into an OOBTree. OOBTree has been maintained and updated.
Good thought, but no dice. Since it is rebuilding the registry on every restart anyway, I went ahead and made the changes to my OFS/Application.py. I confirmed that ZGlobals is now an OOBTree, but subsequent restarts continue to show the same problem :-( I've placed pdb.set_trace all over the DB state-loading code. When I attempt to step into the "state = unpickler.load()" line during the loading of the ZGlobals OOBTree, it immediately throws an ImportError on the ZClass GUID. It doesn't even seem to enter the persistent_load or classLoader methods. Tracing this has gotten me all confused. I understand (I think) that when a ZClass instance is unpickled, persistent_load gets the ZClass GUID and uses that to look up the actual _zclass_ class in ZGlobals. What I can't understand is why it would try to import the GUID when loading ZGlobals. All it has it in is strings and the _zclass_ ExtensionClasses, and they should be handled by the normal unpickling machinery, right? Thanks, Evan
On 12/10/2002 09:58 PM, Evan Simpson wrote:
Shane Hathaway wrote:
My first recommendation would be to turn your ZClass registry into an OOBTree. OOBTree has been maintained and updated.
Good thought, but no dice.
Well, it was worth a shot. :-)
Since it is rebuilding the registry on every restart anyway, I went ahead and made the changes to my OFS/Application.py. I confirmed that ZGlobals is now an OOBTree, but subsequent restarts continue to show the same problem :-(
I've placed pdb.set_trace all over the DB state-loading code. When I attempt to step into the "state = unpickler.load()" line during the loading of the ZGlobals OOBTree, it immediately throws an ImportError on the ZClass GUID. It doesn't even seem to enter the persistent_load or classLoader methods.
Shot in the dark #2: Does it enter the ClassFactory function in Zope.ClassFactory? That's part of the ZClass magic... although there are two ways that the pickler tries to load classes. ZODB only tries to load Persistent instances as ZClasses. If there are things inside the pickles that are not Persistent, yet come from a ZClass, ZODB will not treat them kindly. Shot in the dark #3: Unfortunately, there's one bit of ZODB that I still don't fully understand, and that's setklassstate(). You might be running into this somehow. I'd hazard a guess that the latest cPickleCache does not do the right thing with setklassstate(), since no one has committed to maintain that part. Ugh, we need unit tests for that.
Tracing this has gotten me all confused. I understand (I think) that when a ZClass instance is unpickled, persistent_load gets the ZClass GUID and uses that to look up the actual _zclass_ class in ZGlobals. What I can't understand is why it would try to import the GUID when loading ZGlobals. All it has it in is strings and the _zclass_ ExtensionClasses, and they should be handled by the normal unpickling machinery, right?
Sounds right, although I've forgotten the details. Here's a simple suggestion: after loading the site and browsing a few pages involving ZClasses, flush the ZODB cache and see if the bug appears right away. If so, it's probably occurring as a result of deactivation. Shane
Shane Hathaway wrote:
Shot in the dark #2: Does it enter the ClassFactory function in Zope.ClassFactory? That's part of the ZClass magic... although there are two ways that the pickler tries to load classes. ZODB only tries to load Persistent instances as ZClasses. If there are things inside the pickles that are not Persistent, yet come from a ZClass, ZODB will not treat them kindly.
As far as I can tell, it doesn't enter *any* Python code. Unless there's some other reason for pdb to be unable to step into it, it seems to execute only C code right up to the ImportError. By contrast, I *was* able to trace through ClassFactory when it loaded the BTree -- the trace would then fail when loading the first Bucket. Apparently small OOBTrees store their state differently, because the trace fails on the OOBTree load now, and I can do unpickler.noload() and see the state tuple with all the keys and "None" where the ZClasses should be.
Shot in the dark #3: Unfortunately, there's one bit of ZODB that I still don't fully understand, and that's setklassstate(). You might be running into this somehow. I'd hazard a guess that the latest cPickleCache does not do the right thing with setklassstate(), since no one has committed to maintain that part. Ugh, we need unit tests for that.
I had never heard of it -- I'll have to take a peek.
Here's a simple suggestion: after loading the site and browsing a few pages involving ZClasses, flush the ZODB cache and see if the bug appears right away. If so, it's probably occurring as a result of deactivation.
I had pretty much been assuming that that was the case, but performing this test confirms it. Flushing the cache breaks things immediately.
More data: It may well be loading correctly, but it seems to be storing incorrectly. I managed to grab the ZGlobals pickle string, and it has the _zclass_ stored as a plain class instead of an oid. I copied the commented-out Python persistent_id() in ZODB/Connections.py into a method called test_persistent_id(), and compared its results with coptimizations' new_persistent_id. They matched for everything except _zclass_ objects, for which test_persistent_id returned the _p_oid and new_persistent_id returned None. I have no idea why this is. It would explain why the _zclass_es are stored wrongly, but not why everyone else isn't seeing major breakage. Argh, Evan @ 4-am
I think I've fixed my ZGlobals, but there's definitely a problem in coptimizations. I made Connections.py use the Python persistent_id, restarted twice, and ZGlobals was fine on the second restart. Whew, Evan @ 4-am
On Wed, 11 Dec 2002, Evan Simpson wrote:
I think I've fixed my ZGlobals, but there's definitely a problem in coptimizations. I made Connections.py use the Python persistent_id, restarted twice, and ZGlobals was fine on the second restart.
Ah-ha, you now have a reproducible test case. The coptimizations code has been undergoing "beautification" lately; apparently this particular ZClass-related functionality was not retained correctly. This ought to be a priority for Zope 2.6, I think. Shane
On Wednesday 11 December 2002 11:13 pm, Shane Hathaway wrote:
On Wed, 11 Dec 2002, Evan Simpson wrote:
I think I've fixed my ZGlobals, but there's definitely a problem in coptimizations. I made Connections.py use the Python persistent_id, restarted twice, and ZGlobals was fine on the second restart.
Ah-ha, you now have a reproducible test case. The coptimizations code has been undergoing "beautification" lately; apparently this particular ZClass-related functionality was not retained correctly. This ought to be a priority for Zope 2.6, I think.
I think we should call a memoratorium on code beautification for its own sake, especially C code. Too much pain for too little gain. I'd be willing to bet the code's still ugly anyhow ;^) -Casey
If you feel like it, the change that is biting you is probably in here: http://cvs.zope.org/Zope/lib/python/ZODB/ coptimizations.c.diff?r1=1.17&r2=1.17.60.1&only_with_tag=Zope-2_6-branch I looked and there are some changes to the persistent_id code, but I lack sufficient background with this module to have something leap out at me and say "aha!" On Wednesday, December 11, 2002, at 11:13 PM, Shane Hathaway wrote:
On Wed, 11 Dec 2002, Evan Simpson wrote:
I think I've fixed my ZGlobals, but there's definitely a problem in coptimizations. I made Connections.py use the Python persistent_id, restarted twice, and ZGlobals was fine on the second restart.
Ah-ha, you now have a reproducible test case. The coptimizations code has been undergoing "beautification" lately; apparently this particular ZClass-related functionality was not retained correctly. This ought to be a priority for Zope 2.6, I think.
Shane
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
"SH" == Shane Hathaway <shane@zope.com> writes:
SH> On Wed, 11 Dec 2002, Evan Simpson wrote:
I think I've fixed my ZGlobals, but there's definitely a problem in coptimizations. I made Connections.py use the Python persistent_id, restarted twice, and ZGlobals was fine on the second restart.
SH> Ah-ha, you now have a reproducible test case. The SH> coptimizations code has been undergoing "beautification" lately; SH> apparently this particular ZClass-related functionality was not SH> retained correctly. This ought to be a priority for Zope 2.6, I SH> think. Evan, Can you give me a recipe from reproducing the problem? Ideally, we'd write a little unit test that verified that the persistent_id code handled ZClass properly. Jeremy
On 12/12/2002 12:30 PM, Jeremy Hylton wrote:
"SH" == Shane Hathaway writes:
SH> On Wed, 11 Dec 2002, Evan Simpson wrote:
I think I've fixed my ZGlobals, but there's definitely a problem in coptimizations. I made Connections.py use the Python persistent_id, restarted twice, and ZGlobals was fine on the second restart.
SH> Ah-ha, you now have a reproducible test case. The SH> coptimizations code has been undergoing "beautification" lately; SH> apparently this particular ZClass-related functionality was not SH> retained correctly. This ought to be a priority for Zope 2.6, I SH> think.
Evan,
Can you give me a recipe from reproducing the problem? Ideally, we'd write a little unit test that verified that the persistent_id code handled ZClass properly.
FYI: I made such a test and fixed the bug. Details here: http://sourceforge.net/tracker/index.php?func=detail&aid=653356&group_id=156... Shane
I'll have to take a look, but may not get to it today. FWIW I thought I fixed the bug on Friday. I could certainly create ZClasses through the web without getting any obvious errors. Jeremy
Jeremy Hylton wrote:
I'll have to take a look, but may not get to it today. FWIW I thought I fixed the bug on Friday. I could certainly create ZClasses through the web without getting any obvious errors.
My little test (attached to the patch submission) passes with my version as well as your version, so it'll work. My patch was a lot simpler, though! :-) We need to add at least one unit test. testBug.py is a start. Do you want to convert it to a unit test or should I? If I do it, which of the myriad of ZODB branches should I add it to? :-) Shane
"SH" == Shane Hathaway <shane@zope.com> writes:
SH> Jeremy Hylton wrote:
I'll have to take a look, but may not get to it today. FWIW I thought I fixed the bug on Friday. I could certainly create ZClasses through the web without getting any obvious errors.
SH> My little test (attached to the patch submission) passes with my SH> version as well as your version, so it'll work. My patch was a SH> lot simpler, though! :-) I fixed more bugs than just the reported one :-). SH> We need to add at least one unit test. testBug.py is a start. SH> Do you want to convert it to a unit test or should I? If I do SH> it, which of the myriad of ZODB branches should I add it to? :-) How about we call it testPersistentId? It should be added to ZODB3 trunk and both the Zope-2_6-branch and the ZODB-3_1-branch. Jeremy
My trunk and 2.6 sandboxes are now behaving themselves. Thanks, guys!
On Monday 16 December 2002 11:55 am, Evan Simpson wrote:
My trunk and 2.6 sandboxes are now behaving themselves. Thanks, guys!
Ok, then it must be time for some more "beautification"... ;^) Idle hands are the work of the devil you know. -Casey
The nice thing about fixing this bug is that the code continues to be improved. The last round of beautification made the code substantially more readable. This was a direct result of trying to track down a reference count problem. The problem proved to be elsewhere, but it was nearly impossible for me to inspect the old coptimizations.c code to figure out if the problem was there. If we make the code easier to read, we reduce the burden we all share more keeping the software working. Inscrutable code hides subtle bugs and encourages work arounds that make the whole system harder to understand. Jeremy
The nice thing about fixing this bug is that the code continues to be improved. The last round of beautification made the code substantially more readable. This was a direct result of trying to track down a reference count problem. The problem proved to be elsewhere, but it was nearly impossible for me to inspect the old coptimizations.c code to figure out if the problem was there.
If we make the code easier to read, we reduce the burden we all share more keeping the software working. Inscrutable code hides subtle bugs and encourages work arounds that make the whole system harder to understand.
+1. Remember the XP dogma: refactor mercilessly. --Guido van Rossum (home page: http://www.python.org/~guido/)
Yes, except I would argue that such work is best left on the HEAD rather than applied to a maintenance branch until necessitated by bug reports on that branch, or at least the knowledge that such a bug definitely exists and can be reproduced on that branch. I'm all for improving the code base. We just need to excercise caution changing code on maintenance branches unnecessarily. The HEAD is another matter, IMO go nuts! -Casey On Monday 16 December 2002 12:29 pm, Jeremy Hylton wrote:
The nice thing about fixing this bug is that the code continues to be improved. The last round of beautification made the code substantially more readable. This was a direct result of trying to track down a reference count problem. The problem proved to be elsewhere, but it was nearly impossible for me to inspect the old coptimizations.c code to figure out if the problem was there.
If we make the code easier to read, we reduce the burden we all share more keeping the software working. Inscrutable code hides subtle bugs and encourages work arounds that make the whole system harder to understand.
Jeremy
"CD" == Casey Duncan <casey@zope.com> writes:
CD> Yes, except I would argue that such work is best left on the CD> HEAD rather than applied to a maintenance branch until CD> necessitated by bug reports on that branch, or at least the CD> knowledge that such a bug definitely exists and can be CD> reproduced on that branch. The work was all done on the trunk, but no one noticed before Zope 2.6 was released. Jeremy
participants (6)
-
Casey Duncan -
Evan Simpson -
Guido van Rossum -
jeremy@zope.com -
Matthew T. Kromer -
Shane Hathaway