[Zope] Frequent ZOPE crashes

Tres Seaver tseaver at palladion.com
Mon Nov 30 10:58:16 EST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andreas Krasa wrote:
> Hi Tres,
> 
> thank you very much for your reply!
> 
> Am 29.11.09 21:57, schrieb Tres Seaver:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>>> ----- Original Message ----- From: "Andreas Krasa"
>>>> <andreas.krasa at wu-wien.ac.at>
>>>>
>>> we're right in the process of tracking down the error outside of ZOPE.
>>>
>>> We have completely installed a new server from scratch with RHEL 5.4 and
>>> have re-installed python 2.4.6 and the latest versions of libxml2 and
>>> libxslt there. We double checked the LD config, and made sure that te
>>> correct shared objects get loaded (via lsof).
>>>
>>> We also reinstalled a few other modules that contain C-code (such as
>>> python-ldap) which we need for being able to do authenitcation.
>>>
>>> Unfortunately that didn't really help much. We still experience crashes.
>>>
>>> Are there any known issues with Zope 2.11.2, LibXML2 and/or LibXSLT that
>>> could cause these problems?
>>>
>>> The only thing we re-used is the Data.fs, which we have to, because
>>> we're talking about a production system here.
>>>
>>> Also note, that we have used excatly the same setup for a long time now,
>>> even on the same hardware, without any of these troubles. The problems
>>> only started when we switched over to a new (and probably more
>>> resource-intensive layout).
>>>
>>> We're unfortunately still not able to reproduce these crashes.
>> Can you set 'ulimit -c' to get a core file, which might at least help
>> point to the extension which is to blame (although it may just show the
>> "downstream" victim of a heap munge).
>>
>> What versions of libxml2 / libxslt are you using?  How about lxml?
> 
> Yes, we did set the ulimit and were indeed able to produce a coredump 
> for each crash happening (each having something between 300 and 700 MB). 
> We tried to debug using "gdb" but unfortunaley they only reveal two 
> cases when the crashes occur:
> 
> 1) During garbage collection where the gc tries to clean up damaged 
> python objects
> 2) During some "ceval" process, also related to accessing damaged python 
> objects
> 
> Unfortunately it doesn't reveal what exactly trashes the objects. To us 
> it seems that this could happen some time earlier before either of the 
> two processes mentioned above tries to access the objects and crashes ZOPE.
> 
> For now, we don't really see a reproduceable pattern as it seems to be a 
> somewhat more complex user behavior which leads to this. We were able to 
> extract a few URLs out of the coredumps but directly accessing those 
> does nothing. Also the last logged access in the Z2.log before the 
> coredump triggers nothing, when directly accessing it.
> 
> We're running ZOPE-2.11.2 with an eggified version of ZODB3-3.8.4 plus 
> libxml2-2.7.6, libxslt-1.1.26 and lxml-2.2.4 now, the crashes still 
> happen. Previously we've been running with ZOPE-2.11.2, libxml2-2.7.3, 
> libxslt-1.1.24 and lxml-2.1.5. That also crashed ZOPE occasionally.

Does your application ever use the libxml2 / libxslt Python bindings
directly?  If so, I would go over that part of your app with a
microscope:  it is incredibly easy to trigger segfaults from those
bindings.  If not, then I would look for help on the lxml mailing list.

> This only happened since we switched to a new layout (probably in 
> combination with a few minor Silva updates).

By "new layout", to you mean a new site them?  If so, how do lxml /
libxml2 / lbixslt interact with your application to generate the theme?
 What is structurally different about the new theme?

> We have been using the same system software (RHEL5), hardware, python 
> version and libxml2/libxslt/lxml versions with our old old layout, where 
> everything worked fine for years.
> 
> I would be happy to paste any particular gdb outputs if that is of any 
> help...?

I'm afraid that won't help:  the GC segfaults indicate somebody is
munging the heap way before the segfault is triggered.



Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAksT65cACgkQ+gerLs4ltQ5swACgsSuScLIAfFtd1d9TMznaQEeu
7JEAoJBetJHX3KOCbinGlyV5F/7DWjqK
=qGv5
-----END PGP SIGNATURE-----



More information about the Zope mailing list