Exactly a week ago, I had my server go down for 24-36 hrs as described in a couple of "Urgent - server down" threads. First and foremost, I would like to say a most sincere thank you to everyone for helping. Secondly, my apologies for not posting a follow-up earlier but I wanted to make sure that things really had been resolved first; I also wanted to try to work out the cause of the problem. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --------- Refresh : --------- 1) FreeBSD 3.2, Zope 2.01 PCGI to Apache, MySQL My Zope server kept "spinning" (consuming 90-100% CPU) Connecting via the web resulted in the error message : "Temporarily Unavailable The resource you requested is temporarily unavailable - please try again later. (116) unable to connect, fd=3 " Stopping and starting Zope allowed the server to run again for anything between 1 minute and 8 hrs (depending on load), until the problem reappeared. 2) Decided to reinstall Zope 2.10 Spinning disappeared, now it seemed to hang - 0% load. Error message displayed to users was (regardless of URL) : Zope Error Zope has encountered an error while publishing this resource. Error Type: NameError Error Value: name_param Within the admin interface, the error message was different : Zope Error Zope has encountered an error while publishing this resource. Error Type: NameError Error Value: type Unfortunately, I failed to get tracebacks before making changes that kept the server up. ---------- Solution : ---------- I started to undo the latest changes (thanks for the undo tab !). First thing I did was remove a property that was on one folder (not the root folder). Since doing this, the server has not encountered any subsequent problem. We're still not sure why this should be so because URLs which did not require acquisition or any methods/properties from the folder in question had been giving error messages previously. That's why I'd never tried this before - it seemed so unnatural and unlikely to be the cause. I get the feeling (and I realize that this is as untechnical as slapping the side of your TV/monitor in order to correct the screen display), that the Zope object database is sort of in different states and just needed to be changed. The reason that I say this is that we witnessed further changes in behaviour caused by nothing more than making minor, unrelated changes to the database. For example, 5 days ago, we found that we could not pack the Zope database (Data.fs), even though the server was apparently up and running again. The traceback was : Traceback (innermost last): File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 214, in publish_module File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 179, in publish File /var/zope/51go-p2/lib/python/Zope/__init__.py, line 202, in zpublisher_exception_hook (Object: Fake) File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 165, in publish File /var/zope/51go-p2/lib/python/ZPublisher/mapply.py, line 160, in mapply (Object: manage_pack) File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 102, in call_object (Object: manage_pack) File /var/zope/51go-p2/lib/python/App/ApplicationManager.py, line 338, in manage_pack (Object: ElementWithAttributes) File /var/zope/51go-p2/lib/python/ZODB/DB.py, line 514, in pack File /var/zope/51go-p2/lib/python/ZODB/FileStorage.py, line 867, in pack (Object: /var/zope/51go-p2/var/Data.fs) IOError: (see above) Yet yesterday, I tried to pack the database once more and it worked fine. The only difference between yesterday and 5 days ago were that we'd added a few more DTML methods/documents to the database. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ------------- Conclusions : ------------- As you can see, I don't really have any conclusions - I'm not sure why we encountered these problems. The biggest lesson we learned were all of the extra debugging options that were available (many of which we were totally unaware even existed). My advice is that, if you are not yet aware of how to use them, test these out before you really need them in a crisis scenario. Accumulated from posts by Michel, Brian Lloyd, Jim Fulton : ----------------------------------------------------------- o providing details on your environment (OS, version, zope version in use). Have there been any recent changes in the environment? o symptom description - does CPU go to 100%? If so, does there seem to be any pattern to the requests preceding the hang that you can determine (perhaps by looking at the server log)? Can you regularly reproduce it using a contrived test (like using ab or another benchmarking tool on a particular url or sequence of urls)? Does the problem happen only when accessing the system via pcgi or ZServer (or both)? o run in debug mode (using -D) and send any funny looking or unusual messages that you see o provide any unexplained tracebacks that you see from your site. If you aren't in debug mode, then you'll need to view the document source to see them, o Look at the "debug" screen at Control_Panel/manage_debug. This can be used to spot memory leaks and stuck database connections. If you think you may see something suspicious, save the document src and send it to us. o The Zope event log can be very helpful. See doc/LOGGING.txt, o If the above doesn't show anything, then a more detailed log may be created: Add this stanza to your z2.py start script. # turn on debug logging from ZServer import DebugLogger logfile=os.path.join(INSTANCE_HOME,'var/debug.log') DebugLogger.log=DebugLogger.DebugLogger(logfile).log It should be insertted after Zope is imported, e.g. after this line: exec "import "+MODULE in {} You'll want to watch the debug log file since it gets large quickly. You might also want to read the docstrings in ZServer/DebugLogger.py for more information about the debug log format. In particular, we want to look for requests that don't complete or for apparent leaking requests. Use the debugger (Michel posted a good run-down on this, attached below for convenience) o Keep backing up Data.fs regularly so that you can roll back to a usable version. o Paid support from DC o /Control_Panel/manage_profile Yeah that's another little hidden beauty. Both 'manage_debug' and 'manage_profile' have been hidden from view, I don't know the reason why. I think it would be a good idea to show these links in the control panel if ZOPE_DEBUG_MODE is on. Stephen Bishop also suggested this neat trick : -----------------------------------------------
For a test, start up your server in single thread mode. When the problem next occurs, the last page accessed should be the cause of your problem. If the standard log doesn't do the trick (medusa might log incomming requests before they are passed off to Zope handlers), you can probably put your own logging information in zhttp_channel.work in ZServer/HTTPServer.py unless you work out how to turn on the built in debug logging.
This is how you get into the debugger. -------------------------------------- Shut down your Zope. Go to the lib/python directory Run python: [michel@korak python]$ python1.5.2 Python 1.5.2 (#1, Jul 5 1999, 14:47:37) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam import Zope and ZPublisher Py$ import ZPublisher, Zope Now you can issue requests to Zope just like a web browser: Py$ ZPublisher.Zope('index_html') will call and return the 'index_html' object in the root folder. You can enter the debugger with the d=1 argument: Py$ ZPublisher.Zope('index_html', d=1) * Type "s<cr>c<cr>" to jump to beginning of real publishing process. * Then type c<cr> to jump to the beginning of the URL traversal algorithm. * Then type c<cr> to jump to published object call.
<string>(0)?() pdb>
This fires you into the python debugger, pdb. First off, hit 's' to step into the debugger. For your convinience, two breakpoints are set for you. Hit 'c' once to continue to the first breakpoint, which is at the beginning of the URL traversal to find your object. I ususaly skip beyond this as it is really only useful to look at the request before it traversed and to debug ZPublisher. Hit 'c' again to continue past the first breakpoint to the second, which is rigt before the object you are looking for is called. This is most useful to debug application specific logic. __________________________________________________________________ Apologies if I've omitted anything/anyone - again, thanks to _everybody_ who helped. chas