New subject: [Zope] System performance threads/proccesses & random crashes (SIGPIPE)

22 Mar 2002

      Chris,

Thanks for the great help.

After doing some invertigation, I am now pretty sure the behavior mentionned
in the FastCGI docs is what is happening here, a request isn't given time to
finish.  In my case this can be replicated by users clicking VERY fast on
the web pages, sending a request to the server before the first one is
finished. (I have some slow processes, drawing dynamically generated maps).
I can replicate the SIGPIPE almost 100% using this method.

Anyways, I took a look at the code you mentionned, and that will hopefully
help, although I'm venturing in extremely unknown territory for me here!

A couple of more things:

This, so far as I can tell is a bug in the FastCGI implementation (Not
handling SIGPIPE as suggested).  Should I report it somewhere?

Also I was reading the signal handling stuff for python and came upon this:

Python installs a small number of signal handlers by default: SIGPIPE is
ignored (so write errors on pipes and sockets can be reported as ordinary
Python exceptions) 

Now I'm confused.  If Python ignores SIGPIPE by default, why is Zope
complaining ? This would mean there there is allready a SIGPIPE handler
defined somwehere overriding the default, in which case that is probably
where the change should be made, instead of adding yet another handler.

And finally, how do I "ignore" a signal ? I guess just writing a "pass" will
work ? I'll try it out, I guess on reception of a signal, only one handler
is a called once?

Thanks again and again :)
J.F.

-----Original Message-----
From: Chris McDonough [mailto:chrism@zope.com]
Sent: Friday, March 22, 2002 3:05 PM
To: Doyon, Jean-Francois; zope@zope.org; matt@zope.com
Subject: Re: [Zope] System performance threads/proccesses & random
crashes (SIGPIPE)

You could register a SIGPIPE handler for Zope that just ignores the signal.
See the chrism-logrotate-branch in CVS at
http://cvs.zope.org/?only_with_tag=chrism_logrotate_branch and take a look
at z2.py's "installsighandlers" function... maybe use this branch but add a
SIGPIPE handler to the function that mimics the others except uses the
function SIG_IGN as a callback instead of the current signal handler
function.

----- Original Message -----
From: "Doyon, Jean-Francois" <Jean-Francois.Doyon@CCRS.NRCan.gc.ca>
To: "'Chris McDonough'" <chrism@zope.com>; <zope@zope.org>; <matt@zope.com>
Sent: Friday, March 22, 2002 2:45 PM
Subject: RE: [Zope] System performance threads/proccesses & random crashes
(SIGPIPE)

Hello,

Thanks for the help!

Well, I've determined it most likely isn't PostgreSQL, since I switched the
connections from socket based to TCP based, and the problem still occurs.

So, I turn my attention to FastCGI ...

I just read this on the FastCGI Website:

If an http client aborts a request before it completes, mod_fastcgi does too
- this results in a SIGPIPE to the FastCGI application. At a minimum,
SIGPIPE should be ignored (applications spawned by mod_fastcgi have this
setup automatically). Ideally, it should result in an early abort of the
request handling within your application and a return to the top of the
FastCGI accept() loop.

I guess Zope isn't handling the SIGPIPE the way it is suggested here?
Anyways this seems to be the most likely cause of the problems I'm having.
That AND possibly the problem Matt describes.  Matt, where can I find more
information on this, and possible solutions?

For now, I'm guessing switching to using TCP instead of sockets for FastCGI
connections might help solve the problem? I am getting *A LOT* of these
errors, every 5 to 10 minutes!!! And it *IS* traffic related ... when the
business day dies down, the errors stop occuring (Normal usage pattern at
this time would suupport the theory that the rrors are therefore directly
related to the amount of usage).

I'm also thinking of playing the -restart-delay option of the FastCgiServer
directive ...

Help!!!

Thank you,
J.F.

-----Original Message-----
From: Chris McDonough [mailto:chrism@zope.com]
Sent: Thursday, March 21, 2002 11:08 AM
To: Doyon, Jean-Francois; zope@zope.org
Subject: Re: [Zope] System performance threads/proccesses & random
crashes (SIGPIPE)

SIGPIPE is raised by the OS when a UNIX pipe is broken in the application.
UNIX takes this exception seriously which is why it sends the signal to the
process telling it "you've got a broken pipe".

As you say it started happening when you began using the database adapter,
it may be that some piece of the database adapter opens a pipe that is later
broken (for whatever reason, that's the $10,000 question ;-), causing the OS
to send Zope a SIGPIPE.

It may be possible to install a signal handler for SIGPIPE to get rid of the
problem, but I'm not exactly sure what it should/would do during this
failure state, and it would be more useful to try to pin down the pipe that
is getting broken by making the problem replicable.

The ZODB pool_size parameter is controlled via the pool_size argument to
ZODB.DB.DB's constructor.  It signifies how many database connections its
willing to place in the pool.  When Zope starts up, each Zope thread needs
to use its own database connection.  So you should likely never have a
smaller pool_size than number of threads (the -t parameter to z2.py).
Adjusting these values up and down may improve performance but there has to
this day not been any empirical studies as to how performance is impacted
when you do. It's probably something you need to try out in a load testing
environment.  If you find something interesting, let us know! ;-)

----- Original Message -----
From: "Doyon, Jean-Francois" <Jean-Francois.Doyon@CCRS.NRCan.gc.ca>
To: <zope@zope.org>
Sent: Thursday, March 21, 2002 9:57 AM
Subject: [Zope] System performance threads/proccesses & random crashes
(SIGPIPE)

Hello,

I'm running into random crashes of my zope processes, but I'm not finding
any reference anywhere in the mailing list archives or on the site about
this specific one:

I'm getting:

2002-03-21T14:48:52 ERROR(200) zdaemon zdaemon: Thu Mar 21 09:48:52 2002:
Aiieee! 20070 exited with error code: 13

Every now and then, for now apparent reason.  signal 13 is a SIGPIPE ...

This is Zope 2.5.0 with CMF 1.2 on a severly upgraded/updated/patched RH6.2
... with a Python 2.1.2 built with defaults. It runs with FastCGI to Apache
1.3.2x ...

Usually I just wait a couple of seconds, hit referesh in my browser and
things come back to normal, but it's still annoying, and doesn't look good
to the public.  Note that when this happens, it ususally seems to happen to
ALL processes.  It looks to me like the PIPE's between the master zope
process and it's children dies, and they all have to restart for some
reason. Could this be ? and if so  , why ?

Note that I started noticing this when I for the first time started using
Psycopg to create RDBMS connections to my PostgreSQL ... Could there be a
relation somehow?

On a slightly similar topic, How to I manage performance? I plan on using
Zope for a fairly high demand web site .. I noticed I can control how many
processes/threads start, but then I also read somethign about the ZODB
pool_size ... What is the relation between the two exactly ?

Thank you,

Jean-François Doyon
Internet Service Development and Systems Support
GeoAccess Division
Canadian Center for Remote Sensing
Natural Resources Canada
http://atlas.gc.ca
Phone: (613) 992-4902
Fax: (613) 947-2410

_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )

RE: [Zope] System performance threads/proccesses & random crashes (SIGPIPE)

Doyon, Jean-Francois

Chris McDonough

Chris McDonough

tags

participants (2)