Re: [Zope] Zope woes continue - server going down regularly
First and foremost, most sincere thanks to Jim for the detailed advice/help, with the invaluable tips on debugging. I'm no closer to actually finding out what the problem is but at least feel less helpless ... I can at least try to work out what is going wrong. Comments follow.. Jim Fulton wrote :
I installed and set up Zope 2.10 (using PCGI behind Apache) and transfered my site over it. It *seemed* to be holding up OK. CPU never went up to 90%. But I noticed that several times I'd get the fd=3 error.
Then within 2 hours of use, I get this error when external users try to access the site :
Zope Error Zope has encountered an error while publishing this resource. Error Type: NameError Error Value: name_param
Could you provide a traceback?
Even from the administration interface, I can't access anything. Everything returns :
Zope Error Zope has encountered an error while publishing this resource. Error Type: NameError Error Value: type
Again, a traceback would be helpful.
I'm now in the somewhat ironic, frustrating (and slightly embarassing) situation of having a well-behaved server. It's doing everything properly without a problem. I'm trying very hard to reproduce the errors - running linkbot through the site to check every URL, running 'ab' to increase load ... but to no avail. Earlier, I simply had to restart the server and let it run for 10 minutes and it was a safe bet that the errors would occur. The server's been running fine for the past 2 hrs now.
To this end, does anybody have any ideas : 1) where to get better debugging information to help identify the source of the problem.
- Obviously something changed. You were running fine for 2 months, than started having problems. You can work forward by analysing changes, or you could work backward by debugging.
- Tracebacks are helpful. If you aren't in debug mode, then you'll need to view the document source to see them,
Will send as soon as I can get the server to play up again.
- The Zope event log can be very helpful. See doc/LOGGING.txt,
- Look at the "debug" screen at Control_Panel/manage_debug. This can be used to spot memory leaks and stuck database connections.
Wow - *that* was a discovery. Why isn't it linked in from the Control_Panel ? I had to type in the URL to get to it. Then again, I didn't really know what I was looking for there.
- If the above doesn't show anything, then a more detailed log may be created:
Add this stanza to your z2.py start script.
# turn on debug logging from ZServer import DebugLogger logfile=os.path.join(INSTANCE_HOME,'var/debug.log') DebugLogger.log=DebugLogger.DebugLogger(logfile).log
It should be insertted after Zope is imported, e.g. after this line:
exec "import "+MODULE in {}
Thank you very much for this detailed description of how to set it up :)
You'll want to watch the debug log file since it gets large quickly. You might also want to read the docstrings in ZServer/DebugLogger.py for more information about the debug log format.
In particular, we want to look for requests that don't complete or for apparent leaking requests.
OK. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jim wrote :
It's been running fine for the past 2 months. Then today we've been getting these error messages almost non-stop :
It would be interesting to compare the ZODB from the day (or two) before the trouble started to the current one. Similarly to look at the undo log.
I've checked the undo log. There were only 6 entries for the past 48 hours (it was the weekend). I've backed up Data.fs and tried reversing them. This also has coincided with the uptime of 2 hrs. Btw, I'm not sure if it's related or not, but I find that I can't pack any of the databases. (Control_Panel -> Database -> Pack) The traceback is : Traceback (innermost last): File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 214, in publish_module File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 179, in publish File /var/zope/51go-p2/lib/python/Zope/__init__.py, line 202, in zpublisher_exception_hook (Object: Fake) File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 165, in publish File /var/zope/51go-p2/lib/python/ZPublisher/mapply.py, line 160, in mapply (Object: manage_pack) File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 102, in call_object (Object: manage_pack) File /var/zope/51go-p2/lib/python/App/ApplicationManager.py, line 338, in manage_pack (Object: ElementWithAttributes) File /var/zope/51go-p2/lib/python/ZODB/DB.py, line 514, in pack File /var/zope/51go-p2/lib/python/ZODB/FileStorage.py, line 867, in pack (Object: /var/zope/51go-p2/var/Data.fs) IOError: (see above) Even chmod'ing Data.fs to 666 doesn't help. Unfortunately I can't tell you when this started to occur b/c I've never packed the DB until today (which explains why it's so big !). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [ Michel to Thomas regarding Stability : ] Michel wrote :
Just an aside, we do have excellent paid support. It is not unreasonably expensive and you get to talk directly to us and get a prompt report of our analysis of your problem.
We will indeed contact you directly regarding this, for future contingency. I'm too used to being in 'startup-with-no-budget' mode where I have to do everything myself, so it never occured to me before.
Well, good back ups and all that are an excellent strategy. Further, I almost wonder if you had a backup of the site when it didn't have this problem, if you fired a copy of that up on another port and the problem went away. As it stands now, it could be a bug in Zope, or something changed on your system.
I've been trying just that too - the older database has no errors.
The most important part of debugging is reproduction. Reproducing the problem in a controled way can be the hardest part but it is the most important.
Yes, as somebody who often gets people coming to him saying "it's broken, I don't know why/how or what I did and no, I can't make it happen again", I'm fully aware of the importance of repeatability. I've been holding out before posting to the list, esp. in reply to Jim's very helpful message, because I really need/want to be able to reproduce the errors. It's precisely this inability to reproduce it that leaves me feeling helpless - a bit like when Windows software crashes for no reason whatsoever.
As a stopgap so that you can get some sleep, you can set up a cron script to reset the server every now and then either nicely (call the servers manage_shutdown) or mean (send it a signal).
This is the method that I will indeed use if this problem persists in the morning. If I can, I'll actually do a test (using ZClient or urllib) to first check if the server is responding correctly before doing a restart. Until then, I have 6 hrs to see if I can reproduce some errors.
participants (1)
-
chas