Re: [Zope] Zope woes continue - server going down regularly - Zope

7 Dec 1999

      First and foremost, most sincere thanks to Jim for the detailed
advice/help, with the invaluable tips on debugging. I'm no closer
to actually finding out what the problem is but at least feel
less helpless ... I can at least try to work out what is going
wrong.  

Comments follow..

Jim Fulton wrote :
...
...
I installed and set up Zope 2.10 (using PCGI behind Apache) and
transfered my site over it. It *seemed* to be holding up OK.
CPU never went up to 90%. But I noticed that several times I'd
get the fd=3 error.
Then within 2 hours of use, I get this error when external
users try to access the site :
Zope Error
    Zope has encountered an error while publishing this resource.
    Error Type: NameError
    Error Value: name_param
Could you provide a traceback?
...
Even from the administration interface, I can't access anything.
Everything returns :
Zope Error
    Zope has encountered an error while publishing this resource.
    Error Type: NameError
    Error Value: type
Again, a traceback would be helpful.
I'm now in the somewhat ironic, frustrating (and slightly embarassing)
situation of having a well-behaved server. It's doing everything properly
without a problem. I'm trying very hard to reproduce the errors - running
linkbot through the site to check every URL, running 'ab' to increase
load ... but to no avail. Earlier, I simply had to restart the server 
and let it run for 10 minutes and it was a safe bet that the errors would
occur. The server's been running fine for the past 2 hrs now.
...
...
To this end, does anybody have any ideas :
1) where to get better debugging information to help identify the
   source of the problem.
- Obviously something changed.  You were running fine for 2 months, 
 than started having problems.  You can work forward by analysing
 changes, or you could work backward by debugging.
- Tracebacks are helpful. If you aren't in debug mode, then 
 you'll need to view the document source to see them,
Will send as soon as I can get the server to play up again.
...
- The Zope event log can be very helpful. See doc/LOGGING.txt,
- Look at the "debug" screen at Control_Panel/manage_debug.
 This can be used to spot memory leaks and stuck database connections.
Wow - *that* was a discovery. Why isn't it linked in from the Control_Panel ?
I had to type in the URL to get to it. Then again, I didn't really know what
I was looking for there.
...
- If the above doesn't show anything, then a more detailed log may be
created:
Add this stanza to your z2.py start script.
# turn on debug logging
 from ZServer import DebugLogger
 logfile=os.path.join(INSTANCE_HOME,'var/debug.log')
 DebugLogger.log=DebugLogger.DebugLogger(logfile).log
It should be insertted after Zope is imported, e.g. after this line:
exec "import "+MODULE in {}
Thank you very much for this detailed description of
how to set it up :)
...
You'll want to watch the debug log file since it gets large quickly. 
 You might also want to read the docstrings in ZServer/DebugLogger.py 
 for more information about the debug log format.
In particular, we want to look for requests that don't complete
 or for apparent leaking requests.
OK.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Jim wrote :
...
...
It's been running fine for the past 2 months.
Then today we've been getting these error messages
almost non-stop :
It would be interesting to compare the ZODB from the day
(or two) before the trouble started to the current one.
Similarly to look at the undo log.
I've checked the undo log. There were only 6 entries for the past
48 hours (it was the weekend). I've backed up Data.fs and tried
reversing them. This also has coincided with the uptime of 2 hrs.

Btw, I'm not sure if it's related or not, but I find that I can't
pack any of the databases. (Control_Panel -> Database -> Pack)
The traceback is :

Traceback (innermost last):
  File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 214, in
publish_module
  File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 179, in
publish
  File /var/zope/51go-p2/lib/python/Zope/__init__.py, line 202, in
zpublisher_exception_hook
    (Object: Fake)
  File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 165, in
publish
  File /var/zope/51go-p2/lib/python/ZPublisher/mapply.py, line 160, in mapply
    (Object: manage_pack)
  File /var/zope/51go-p2/lib/python/ZPublisher/Publish.py, line 102, in
call_object
    (Object: manage_pack)
  File /var/zope/51go-p2/lib/python/App/ApplicationManager.py, line 338, in
manage_pack
    (Object: ElementWithAttributes)
  File /var/zope/51go-p2/lib/python/ZODB/DB.py, line 514, in pack
  File /var/zope/51go-p2/lib/python/ZODB/FileStorage.py, line 867, in pack
    (Object: /var/zope/51go-p2/var/Data.fs)
IOError: (see above)

Even chmod'ing  Data.fs to 666 doesn't help.  Unfortunately I 
can't tell you when this started to occur b/c I've never packed 
the DB until today (which explains why it's so big !).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[ Michel to Thomas regarding Stability : ] 

Michel wrote :
...
Just an aside, we do have excellent paid support.  It is not
unreasonably expensive and you get to talk directly to us and get a
prompt report of our analysis of your problem.
We will indeed contact you directly regarding this, for future
contingency. I'm too used to being in 'startup-with-no-budget'
mode where I have to do everything myself, so it never occured
to me before.
...
Well, good back ups and all that are an excellent strategy.  Further, I
almost wonder if you had a backup of the site when it didn't have this
problem, if you fired a copy of that up on another port and the problem
went away.  As it stands now, it could be a bug in Zope, or something
changed on your system.
I've been trying just that too - the older database has no errors.
...
The most important part of debugging is reproduction.  Reproducing the
problem in a controled way can be the hardest part but it is the most
important.
Yes, as somebody who often gets people coming to him saying "it's broken, 
I don't know why/how or what I did and no, I can't make it happen again", 
I'm fully aware of the importance of repeatability. I've been holding out 
before posting to the list, esp. in reply to Jim's very helpful message, 
because I really need/want to be able to reproduce the errors.
It's precisely this inability to reproduce it that leaves me feeling 
helpless - a bit  like when Windows software crashes for no reason 
whatsoever.
...
As a stopgap so that you can get some sleep, you can set up a cron
script to reset the server every now and then either nicely (call the
servers manage_shutdown) or mean (send it a signal).
This is the method that I will indeed use if this problem persists 
in the morning. If I can, I'll actually do a test (using ZClient or
urllib) to first check if the server is responding correctly before
doing a restart. Until then, I have 6 hrs to see if I can reproduce
some errors.

Re: [Zope] Zope woes continue - server going down regularly

chas

tags

participants (1)