[Zope] Re: Non-responsive objects reprise

Sun Nov 13 14:18:29 EST 2005

You probably have a network problem, all the Zope logs show everything 
was completed normally (your points 1 and 3). Your problem may be tied 
to packet size or keepalives. A network trace, for instance using 
Ethereal, will probably help you more than anything.

Florent

Garth B. wrote:
> Hello everyone, this is from an older thread which I'm resurrecting
> with more information.
> 
> Despite Dieter's helpful pointers I'm no closer to solving this
> problem but do have more information about it in case anyone can lend
> a hand.
> 
> To quickly recap:  Periodically when visiting our zope site, certain
> objects appear not to respond.  It's consistently the same objects
> from a Page Template in one folder to an
> image somewhere else.  The site is running on Zope 2.8.1, Python 2.3.5
> and sitting behind a VHM and Apache 2.0.46 using the usual ReWrite
> rules.  This problem suddenly started
> several months ago with the site having been running smoothly for many
> months prior. This is all on Red Hat Enterprise Linux ES release 3
> (Taroon Update 2).  The server is a
> dual processor with 1GB RAM, 300GB of hard disk space, hosted by
> Rackspace.  The site is relatively large and reasonably active.  Its
> content is largely made up of Page
> Templates with a few supporting python scripts and Script (Python)'s. 
> There are also a few ZClass-based objects that offer no real unique
> functionality other than providing an
> interface for the admins to create "News" or "Feature" items. The site
> also utilizes a MySQL database.
> 
> I've noticed the following things about this problem:
> 
> =================
> 1) DeadlockDebugger shows no problems when one of the objects appears
> not to be responding.  Everything appears normal.
> 
> 2) I can ALWAYS successfully get to the non-responsive objects by
> bypassing Apache and directly viewing the Zope server's equivalent
> :8080 address.
> 
> 3) While tailing the trace.log when an object is siezing through
> Apache, I can see the request come to Zope and go right back out with
> no problem.  I think that's what
> this is illustrating:
> 
> B -1348776468 2005-11-13T10:46:37 GET
> /VirtualHostBase/http/www.domain.org:80/portal/html/VirtualHostRoot/resources/contact
> I -1348776468 2005-11-13T10:46:37 0
> A -1348776468 2005-11-13T10:46:38 200 14938
> E -1348776468 2005-11-13T10:46:38
> 
> 4) Turning on debugging output for Apache shows the following proxy
> errors when trying to access an offending object.  I've searched for
> related information about this proxy and
> only found one hit from the ZODB-DEV list from 2004 with no responses.
>  The errors:
> 
> [Sat Nov 12 00:33:33 2005] [error] [client xx.xx.xx.xx] proxy: error
> reading status line from remote server localhost
> [Sat Nov 12 00:33:33 2005] [error] [client xx.xx.xx.xx] proxy: Error
> reading from remote server returned by /contact
> [Sat Nov 12 00:34:02 2005] [error] [client xx.xx.xx.xx] proxy: error
> reading status line from remote server localhost
> [Sat Nov 12 00:34:02 2005] [error] [client xx.xx.xx.xx] proxy: Error
> reading from remote server returned by /resources/index_html
> 
> I removed the client IP.  Keep #2 and #3 in mind in the context of this problem.
> 
> 5) In case there was something in one of the templates that was
> screwing things up, I methodically removed portions of a page (or its
> inherited template).  When the page suddenly started responding
> through Apache I thought I hit paydirt, but then I noticed in one
> instance that all I removed was a block of plain HTML (no METAL/TALES
> statements) and that put me back at square one.  I think #2 and #3
> make this point irrelevant, and certain images will get hung up, too.
> 
> 6) The server is also running Mailman (using the same Python as Zope).
>  It uses a seperate virtual host container in Apache to expose its
> adminstrative interface.  One of my co-workers swears that when he
> experiences the siezing, he soon after gets several emails from one of
> the Mailman lists which is supposed to be a once-a-day broadcast-only
> list.
> I think this is more of a coincidence though, and I haven't gotten a
> big enough sample size of occurrences to rely on this report.
> 
> 7) Restarting Zope *usually* corrects the problem (on Friday,
> restarting it (several times) didn't help)
> 
> 8) Restarting Apache sometimes corrects the problem without needing to
> restart Zope.
> 
> 9) On one occasion killing Mailman suddenly made one of the offending
> objects respond for a little then stop.
> 
> 10) On the rare occasion we have had to physically reboot the server
> (like on Friday).
> 
> 11) After the server was rebooted on Friday, memory usage for Zope
> went from about 3% to 20+% as reported by 'top' over a period of about
> 12 hours.  I don't know whether that is indicative of a leak or just
> general memory consumption. Restarting Zope appears to return that
> memory back to the OS.  This memory usage is what we normally see for
> this site.
> 
> 12) Upgrading from Zope 2.7.6 to 2.8.1 appeared to help for a little
> while, but the problem either came back or never left.
> 
> 13) I briefly enabled mod_disk_cache in Apache for this site in case
> Zope was getting too stressed out.  It appeared to work wonders, but
> some file objects, like PDFs, would
> periodically be reported as corrupted by Acrobat after being
> downloaded.  I assume this was a failure to configure mod_disk_cache
> appropriately, and we've since disabled it (at
> which point Acrobat stopped complaining about corrupted PDFs.  The
> siezing problem looked as though it disappeared while mod_disk_caching
> was enabled.  Indeed, Watching the Apache and Zope logs showed
> requests more often being fulfilled only by Apache than by Zope. 
> Perhaps the proxy problems in #4 is indicative of a loaded Zope that
> needs caching. We are not running ZEO or anything like that.  Perhaps
> we should.
> =================
> 
> Apologies for the long email but I have no idea what's going on... if
> ANYONE has ANY suggestions or ideas on what else I could investigate
> it would be GREATLY appreciated!
> 
> Thank you!
> 
> Garth

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg at nuxeo.com