[Zope] Summary : server-down / debugging
chas
panda@skinnyhippo.com
Tue, 14 Dec 1999 14:29:41 +0800
Exactly a week ago, I had my server go down for 24-36 hrs
as described in a couple of "Urgent - server down" threads.
First and foremost, I would like to say a most sincere
thank you to everyone for helping.
Secondly, my apologies for not posting a follow-up earlier
but I wanted to make sure that things really had been
resolved first; I also wanted to try to work out the cause
of the problem.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
---------
Refresh :
---------
1) FreeBSD 3.2, Zope 2.01 PCGI to Apache, MySQL
My Zope server kept "spinning" (consuming 90-100% CPU)
Connecting via the web resulted in the error message :
"Temporarily Unavailable
The resource you requested is temporarily unavailable -
please try again later.
(116) unable to connect, fd=3 "
Stopping and starting Zope allowed the server to run again
for anything between 1 minute and 8 hrs (depending on load),
until the problem reappeared.
2) Decided to reinstall Zope 2.10
Spinning disappeared, now it seemed to hang - 0% load.
Error message displayed to users was (regardless of URL) :
Zope Error
Zope has encountered an error while publishing this resource.
Error Type: NameError
Error Value: name_param
Within the admin interface, the error message was different :
Zope Error
Zope has encountered an error while publishing this resource.
Error Type: NameError
Error Value: type
Unfortunately, I failed to get tracebacks before making changes
that kept the server up.
----------
Solution :
----------
I started to undo the latest changes (thanks for the undo tab !).
First thing I did was remove a property that was on one folder (not
the root folder). Since doing this, the server has not encountered
any subsequent problem.
We're still not sure why this should be so because URLs which did
not require acquisition or any methods/properties from the folder
in question had been giving error messages previously. That's why
I'd never tried this before - it seemed so unnatural and unlikely
to be the cause.
I get the feeling (and I realize that this is as untechnical as
slapping the side of your TV/monitor in order to correct the screen
display), that the Zope object database is sort of in different
states and just needed to be changed. The reason that I say this is
that we witnessed further changes in behaviour caused by nothing
more than making minor, unrelated changes to the database.
For example, 5 days ago, we found that we could not pack the Zope
database (Data.fs), even though the server was apparently up and
running again. The traceback was :
Traceback (innermost last):
File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 214, in
publish_module
File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 179, in
publish
File /var/zope/51go-p2/lib/python/Zope/__init__.py, line 202, in
zpublisher_exception_hook
(Object: Fake)
File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 165, in
publish
File /var/zope/51go-p2/lib/python/ZPublisher/mapply.py, line 160, in
mapply
(Object: manage_pack)
File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 102, in
call_object
(Object: manage_pack)
File /var/zope/51go-p2/lib/python/App/ApplicationManager.py, line 338,
in manage_pack
(Object: ElementWithAttributes)
File /var/zope/51go-p2/lib/python/ZODB/DB.py, line 514, in pack
File /var/zope/51go-p2/lib/python/ZODB/FileStorage.py, line 867, in pack
(Object: /var/zope/51go-p2/var/Data.fs)
IOError: (see above)
Yet yesterday, I tried to pack the database once more and it worked
fine. The only difference between yesterday and 5 days ago were that
we'd added a few more DTML methods/documents to the database.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------
Conclusions :
-------------
As you can see, I don't really have any conclusions - I'm not sure
why we encountered these problems.
The biggest lesson we learned were all of the extra debugging options
that were available (many of which we were totally unaware even existed).
My advice is that, if you are not yet aware of how to use them, test
these out before you really need them in a crisis scenario.
Accumulated from posts by Michel, Brian Lloyd, Jim Fulton :
-----------------------------------------------------------
o providing details on your environment (OS, version, zope
version in use). Have there been any recent changes in
the environment?
o symptom description - does CPU go to 100%? If so, does
there seem to be any pattern to the requests preceding
the hang that you can determine (perhaps by looking at
the server log)? Can you regularly reproduce it using
a contrived test (like using ab or another benchmarking
tool on a particular url or sequence of urls)? Does the
problem happen only when accessing the system via pcgi
or ZServer (or both)?
o run in debug mode (using -D) and send any funny looking
or unusual messages that you see
o provide any unexplained tracebacks that you see from your
site. If you aren't in debug mode, then you'll need to view
the document source to see them,
o Look at the "debug" screen at Control_Panel/manage_debug.
This can be used to spot memory leaks and stuck database
connections. If you think you may see something suspicious,
save the document src and send it to us.
o The Zope event log can be very helpful. See doc/LOGGING.txt,
o If the above doesn't show anything, then a more detailed log may be
created:
Add this stanza to your z2.py start script.
# turn on debug logging
from ZServer import DebugLogger
logfile=os.path.join(INSTANCE_HOME,'var/debug.log')
DebugLogger.log=DebugLogger.DebugLogger(logfile).log
It should be insertted after Zope is imported, e.g. after this line:
exec "import "+MODULE in {}
You'll want to watch the debug log file since it gets large quickly.
You might also want to read the docstrings in ZServer/DebugLogger.py
for more information about the debug log format.
In particular, we want to look for requests that don't complete
or for apparent leaking requests.
Use the debugger (Michel posted a good run-down on this, attached
below for convenience)
o Keep backing up Data.fs regularly so that you can roll back to
a usable version.
o Paid support from DC
o /Control_Panel/manage_profile
Yeah that's another little hidden beauty. Both 'manage_debug' and
'manage_profile' have been hidden from view, I don't know the reason
why. I think it would be a good idea to show these links in the control
panel if ZOPE_DEBUG_MODE is on.
Stephen Bishop also suggested this neat trick :
-----------------------------------------------
>For a test, start up your server in single thread mode. When the problem
>next occurs, the last page accessed should be the cause of your problem.
>If the standard log doesn't do the trick (medusa might log incomming requests
>before they are passed off to Zope handlers), you can probably put your own
>logging information in zhttp_channel.work in ZServer/HTTPServer.py unless
>you work out how to turn on the built in debug logging.
This is how you get into the debugger.
--------------------------------------
Shut down your Zope.
Go to the lib/python directory
Run python:
[michel@korak python]$ python1.5.2
Python 1.5.2 (#1, Jul 5 1999, 14:47:37) [GCC egcs-2.91.66
19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
import Zope and ZPublisher
Py$ import ZPublisher, Zope
Now you can issue requests to Zope just like a web browser:
Py$ ZPublisher.Zope('index_html')
will call and return the 'index_html' object in the root folder.
You can enter the debugger with the d=1 argument:
Py$ ZPublisher.Zope('index_html', d=1)
* Type "s<cr>c<cr>" to jump to beginning of real publishing process.
* Then type c<cr> to jump to the beginning of the URL traversal
algorithm.
* Then type c<cr> to jump to published object call.
> <string>(0)?()
pdb>
This fires you into the python debugger, pdb. First off, hit 's' to step
into the debugger.
For your convinience, two breakpoints are set for you. Hit 'c' once to
continue to the first breakpoint, which is at the beginning of the URL
traversal to find your object. I ususaly skip beyond this as it is
really only useful to look at the request before it traversed and to
debug ZPublisher.
Hit 'c' again to continue past the first breakpoint to the second, which
is rigt before the object you are looking for is called. This is most
useful to debug application specific logic.
__________________________________________________________________
Apologies if I've omitted anything/anyone - again, thanks
to _everybody_ who helped.
chas