Re: Zope 2.4 crashes -- possible fix identified, other solutions also suggested
Hi Matt and other zopers, thanks for your help. I've tested other combinations, also with Zope cvs unreleased version at cvs.zope.org, and the test script (province) run well if it's the only script in the ZODB. But inserting it in a CMF 1.1 skin, my Zope (2.4.x, 2.5.x, cvs version, with python 2.1,2.1.1,2.2b2 with and without pymalloc and cycle-gc) auto restart.. with a minor frequency.. but auto restart. The old zbytecode in 2.3.x PythonScripts can help us? It's possible to inactivate the RestrictedPython compiler in a Zope 2.4.x/2.5.x ? The security problem is under control for now (obviously, just for portal with predefined and secure contents editable by users). I think inactivation can be a solution for now, waiting for definitive, well running version of RestrictedPython engine. What do you think about? Bye, Stefano ---------------------------------------------------------- Stefano Noferi n o z e S.r.l. Soluzioni Open-Source Via Caduti del Lavoro, 32 56122 Pisa (PI) - Italy Tel: +39 (0)50 533320 Fax: +39 (0)50 526604 Email: stefano@noze.it Web: http://www.noze.it -= "Whatever you like it to be, it will be" =- ----------------------------------------------------------
Stefano Noferi wrote:
Hi Matt and other zopers,
thanks for your help. I've tested other combinations, also with Zope cvs unreleased version at cvs.zope.org, and the test script (province) run well if it's the only script in the ZODB. But inserting it in a CMF 1.1 skin, my Zope (2.4.x, 2.5.x, cvs version, with python 2.1,2.1.1,2.2b2 with and without pymalloc and cycle-gc) auto restart.. with a minor frequency.. but auto restart.
The old zbytecode in 2.3.x PythonScripts can help us?
It's possible to inactivate the RestrictedPython compiler in a Zope 2.4.x/2.5.x ? The security problem is under control for now (obviously, just for portal with predefined and secure contents editable by users).
I think inactivation can be a solution for now, waiting for definitive, well running version of RestrictedPython engine.
What do you think about?
Bye, Stefano
Hi Stefano, We're still aware of a few potential innacuracies in the stack size computations for the python compiler module which can be causing problems; also the change to ExtensionClass.h tested previously on a branch is not merged into the main development branches. As much as possible, we're trying to isolate problems by changing one thing at a time, rather than many things all at once. A side effect of shutting off the garbage collector is that you can have some storage leaks. We're working on being able to re-enable the garbage collector so that you don't exhaust memory over time.
"MTK" == Matthew T Kromer <matt@zope.com> writes:
MTK> A side effect of shutting off the garbage collector is that you MTK> can have some storage leaks. We're working on being able to MTK> re-enable the garbage collector so that you don't exhaust MTK> memory over time. Do you have any more idea about what shutting the garbage collector off achieves? In practice, the garbage collector's most common effect is to turn latent bugs into manifest bugs; a bug has trashed part of memory and the garbage collector just happens to find it first. If you turn GC off in these cases, you run a little longer, but you're running with corrupted memory. Jeremy
On Mon, 2001-12-17 at 20:57, Jeremy Hylton wrote:
"MTK" == Matthew T Kromer <matt@zope.com> writes:
MTK> A side effect of shutting off the garbage collector is that you MTK> can have some storage leaks. We're working on being able to MTK> re-enable the garbage collector so that you don't exhaust MTK> memory over time.
Do you have any more idea about what shutting the garbage collector off achieves? In practice, the garbage collector's most common effect is to turn latent bugs into manifest bugs; a bug has trashed part of memory and the garbage collector just happens to find it first. If you turn GC off in these cases, you run a little longer, but you're running with corrupted memory.
Not if the problem you're facing is bad colaboration with the cycle-gc. As can be gleaned from discussions in this list, the nature of the changes in ExtensionClass.h and the problem report on sourceforge mentioned in another e-mail in this list, colaboration with the cycle-gc is an active duty for Python objects written in C (or their derivatives, such as ExtensionClass por PythonScripts), and if you colaborate wrongly with the cycle-gc you are garanteed to get a crash, and worse of all, in the most random moments possible, with no clue as to what caused it.
From my mile-high look at the issues, it seems like the cycle-gc asks for an object where to look for for it's references (at least thats what the tp_traverse function looks like it does). So, if your tp_traverse sends the gc somewhere it shouldn't go (or if tp_traverse itself is not some C function pointer), you get a crash.
Of course I could be absolutely wrong in this description (you'd need to get a python guru for a correct one. Oh, wait, you guys got a bunch of those hanging around, right? :-)) but it is the only one I got :-) BTW, from my tests so far, it seems like: import gc gc.disable() also stops the SIG11 without the need of a python recompile, but I'll only be sure when the server gets office-hours-traffic tomorrow morning. Cheers, -- Ideas don't stay in some minds very long because they don't like solitary confinement.
Jeremy Hylton wrote Do you have any more idea about what shutting the garbage collector off achieves? In practice, the garbage collector's most common effect is to turn latent bugs into manifest bugs; a bug has trashed part of memory and the garbage collector just happens to find it first. If you turn GC off in these cases, you run a little longer, but you're running with corrupted memory.
Sorry I haven't been keeping up with the zope-* lists of late - this is what I've found as well. Something, and I strongly suspect it's inside the Zope C code, is playing jumpy-jumpy-stomp-stomp on bits of memory. The garbage collector is hitting this corrupted data and dying. I've posted before about the structure I've found that's corrupted (it's _always_ the same structure) but I've not yet been able to track down what it is. For us, the "fix" has been to run more zeo clients behind a loadbalancer, so that when one crashes out (about every 10-12 hours for us) things keep working, and the zopecontrol script restarts it. Anthony
Jeremy Hylton wrote:
"MTK" == Matthew T Kromer <matt@zope.com> writes:
MTK> A side effect of shutting off the garbage collector is that you MTK> can have some storage leaks. We're working on being able to MTK> re-enable the garbage collector so that you don't exhaust MTK> memory over time.
Do you have any more idea about what shutting the garbage collector off achieves? In practice, the garbage collector's most common effect is to turn latent bugs into manifest bugs; a bug has trashed part of memory and the garbage collector just happens to find it first. If you turn GC off in these cases, you run a little longer, but you're running with corrupted memory.
Jeremy
Well, one suspicion I have is that (aside from memory corruption caused by the compiler sack size bugs and the frame setup bug in 2.1 when handling exceptions) ExtensionClasses are providing bogus data to modules which aren't checking the flags to see if the GC fields are populated. Some of the people who have tried the modified extensionclass.h which pads out the type object to align it with the 2.1 type object THINK they have seen a reduction in crashes, but these same folks also have not applied fixes for the two known bugs. Soo... if shutting off GC extends time between crashes for some folks from every 15 minutes to 3 times a day, my advise is to shut off GC.
On Tue, 2001-12-18 at 13:44, Matthew T. Kromer wrote:
Soo... if shutting off GC extends time between crashes for some folks from every 15 minutes to 3 times a day, my advise is to shut off GC.
Now I can really confirm that gc.disable() is enough to avoid the crashes (no need to recompile python --without-gc). And as far as we could notice, it didn't start leaking like crazy. We haven't been able to identify any noticeable increase in memory consumption. If it is leaking we will need some more time to notice it. Cheers, Leo -- Ideas don't stay in some minds very long because they don't like solitary confinement.
Leonardo Rochael Almeida wrote:
On Tue, 2001-12-18 at 13:44, Matthew T. Kromer wrote:
Soo... if shutting off GC extends time between crashes for some folks from every 15 minutes to 3 times a day, my advise is to shut off GC.
Now I can really confirm that gc.disable() is enough to avoid the crashes (no need to recompile python --without-gc).
And as far as we could notice, it didn't start leaking like crazy. We haven't been able to identify any noticeable increase in memory consumption. If it is leaking we will need some more time to notice it.
Cheers, Leo
Keep in mind that the leaks you may experience are directly related to what code you run, and whether or not they introduce cycles. Some of the restricted python compiler code did/does create cycles under the assumption that the GC would break them. But if you dont lean on code that uses RestrictedPython too much, you can live with slow leaks elsewhere.
On Tue, 2001-12-18 at 14:25, Matthew T. Kromer wrote:
[...] Keep in mind that the leaks you may experience are directly related to what code you run, and whether or not they introduce cycles. Some of the restricted python compiler code did/does create cycles under the assumption that the GC would break them. But if you dont lean on code that uses RestrictedPython too much, you can live with slow leaks elsewhere.
Yes, I understand that, but we use a lot of PythonScripts, mostly as ZClass methods, and a 1.5GB Data.fs full of ZClass instances. And ZCatalog, boy do we use ZCatalog. Well, we couldn't live with a 1.5GB full of ZClass instances if we didn't lean heavily against the Catalog, now could we? :-) Cheers, Leo -- Ideas don't stay in some minds very long because they don't like solitary confinement.
Leonardo Rochael Almeida wrote:
On Tue, 2001-12-18 at 13:44, Matthew T. Kromer wrote:
Soo... if shutting off GC extends time between crashes for some folks from every 15 minutes to 3 times a day, my advise is to shut off GC.
Now I can really confirm that gc.disable() is enough to avoid the crashes (no need to recompile python --without-gc).
I changed the z2.py to include this, but just as a data point to Matt, ParsedXML is still crashing. ;) Regards, Martijn
Matt et al - We are using python 2.1.1 with pymalloc disabled and gc enabled with zope 2.4.3 on the solaris platform. We are currently seeing only 1 or 2 restarts a day for the zope/zeo clients. Fortunately, we have not experienced any trouble with the zeo server. We have applied the a) extensionclass bugfix and I'm also planning to apply the b) frame setup bugfix later this week. We feel that patch a) has reduced the number of restarts ... but the site activity has descreased as well so it is difficult to really confirm. 1) I would like to know if any patches are available yet for the c) compiler stack size bug. When and if this patch is available, would it require a new python installation, a new zope installation, or both? 2) Are there any other related patches/comments with respect to python 2.1.1 and zope 2.4.3 and this issue? thanks, - joe n. At Tue, 18 Dec 2001 10:44:33 -0500, Matthew T. Kromer wrote:
Well, one suspicion I have is that (aside from memory corruption caused by the compiler sack size bugs and the frame setup bug in 2.1 when handling exceptions) ExtensionClasses are providing bogus data to modules which aren't checking the flags to see if the GC fields are populated.
Some of the people who have tried the modified extensionclass.h which pads out the type object to align it with the 2.1 type object THINK they have seen a reduction in crashes, but these same folks also have not applied fixes for the two known bugs.
Soo... if shutting off GC extends time between crashes for some folks from every 15 minutes to 3 times a day, my advise is to shut off GC.
The new compiler fixes that we have to date are actually in 2.4 branch and 2.5 branch as well as the trunk; they're in the lib/python/RestrictedPython/compiler directory; so you could go to http://cvs.zope.org and just pull down that directory (optionally pulling down your specific branch too). Pythonlabs is working on a few more cases which I expect will be done and checked in RSN; then we'll backport those to the branches so you don't have to install a new python to get the fixes with Zope. Anthony Baxter is anticipating a Python 2.1.2 beta real soon now (probably this weekend) so I am going to try to get that into Zope 2.5's binary releases, although we may put out a Zope 2.5 beta 3 first. This will include the necessary Python patches to ceval.c to fix the frame bug; after it hits the streets Python 2.1.2 will become our recommended Python for Zope. The ExtensionClass.h patch has NOT been merged into the branches and trunk yet pending further review. ----- Original Message ----- From: "Joseph Wayne Norton" <norton@alum.mit.edu> To: "Matthew T. Kromer" <matt@zope.com> Cc: <zope-dev@zope.org> Sent: Tuesday, December 18, 2001 7:59 PM Subject: Re: [Zope-dev] Re: Zope 2.4 crashes -- possible fix identified, other solutions also suggested
Matt et al -
We are using python 2.1.1 with pymalloc disabled and gc enabled with zope 2.4.3 on the solaris platform. We are currently seeing only 1 or 2 restarts a day for the zope/zeo clients. Fortunately, we have not experienced any trouble with the zeo server.
We have applied the a) extensionclass bugfix and I'm also planning to apply the b) frame setup bugfix later this week. We feel that patch a) has reduced the number of restarts ... but the site activity has descreased as well so it is difficult to really confirm.
1) I would like to know if any patches are available yet for the c) compiler stack size bug. When and if this patch is available, would it require a new python installation, a new zope installation, or both?
2) Are there any other related patches/comments with respect to python 2.1.1 and zope 2.4.3 and this issue?
thanks,
- joe n.
At Tue, 18 Dec 2001 10:44:33 -0500, Matthew T. Kromer wrote:
Well, one suspicion I have is that (aside from memory corruption caused by the compiler sack size bugs and the frame setup bug in 2.1 when handling exceptions) ExtensionClasses are providing bogus data to modules which aren't checking the flags to see if the GC fields are populated.
Some of the people who have tried the modified extensionclass.h which pads out the type object to align it with the 2.1 type object THINK they have seen a reduction in crashes, but these same folks also have not applied fixes for the two known bugs.
Soo... if shutting off GC extends time between crashes for some folks from every 15 minutes to 3 times a day, my advise is to shut off GC.
Matt - If possible, I would prefer to use a source Python 2.1.2 release with a source zope 2.4.4 ? bugfix release (or create my own from the 2.4 cvs branch) once the fixes are complete. We do not want to put a 2.5 release in production at this time. Thanks for the update. regards, - j At Wed, 19 Dec 2001 09:25:08 -0500, Matthew T. Kromer wrote:
The ExtensionClass.h patch has NOT been merged into the branches and trunk yet pending further review.
just my 2 cents, but we have been using this in production for about 1 week or more without any troubles.
Well, if you want to grab what is probably going to turn in to Python 2.1.2 from CVS, you can get the release21-maint branch from :pserver:anonymous@cvs.python.sourceforge.net:/cvsroot/python package python. This already has the important change to ceval.c in it; but I'm not sure that the rest of the changes for the 2.1.2 release have been finalized. I think Anthony Baxter is going to try to put a beta / release candidate of 2.1.2 up real soon now, but I can't speak for him :) On Wednesday, December 19, 2001, at 10:39 PM, Joseph Wayne Norton wrote:
Matt -
If possible, I would prefer to use a source Python 2.1.2 release with a source zope 2.4.4 ? bugfix release (or create my own from the 2.4 cvs branch) once the fixes are complete. We do not want to put a 2.5 release in production at this time.
Thanks for the update.
regards,
- j
At Wed, 19 Dec 2001 09:25:08 -0500, Matthew T. Kromer wrote:
The ExtensionClass.h patch has NOT been merged into the branches and trunk yet pending further review.
just my 2 cents, but we have been using this in production for about 1 week or more without any troubles.
participants (7)
-
Anthony Baxter -
jeremy@zope.com -
Joseph Wayne Norton -
Leonardo Rochael Almeida -
Martijn Faassen -
Matthew T. Kromer -
Stefano Noferi