mystery of the server hang solved?!
For quite awhile, I've been having problems with our Zope server just becoming unresponsive. Usually I note that one python process is monopolizing one of the CPUs. I turned on the detailed logging and Apache's server status, but I still couldn't peg down the problem. Last week it got better. I didn't have to restart the system nearly as often. This coincided with the conclusion of a proposal submission process that runs as part of a system that I suspected was the culprit. Things got so quiet that I decided to try the upgrade to 2.4.0b3 again. I succeeded enough that I moved it into place over the weekend. Today I ran into one of our support people who I noticed was frequently on when the server went bad. He had even complained that he was having a hard time getting to know Zope with it going down all the time. (Hint!) I mentioned that I had changed some things and it seemed more stable, and invited him to give it another try. A few minutes later, I was working with someone else when the server became unresponsive. A quick check of Apache's status showed that the support person I'd just been talking with had several processes waiting for a response. Ah ha! Well...I restarted and went digging. He didn't have much there. There wasn't even a Python Script. But then I noticed a little 'H'...in the icon for his standard_html_footer. I had modified the PUT_factory so that when text/html is uploaded, an HTMLDocument is created. Apparently standard_html_footer was created in this way. It was trying to wrap itself! I thought that there were safeguards against such recursion, but they didn't seem to catch this. It'd be nice if HTMLDocument would verify that it's not calling itself. I'm thrilled to have found the problem. We've only got about 30 authors right now. I suspect it's going to get a lot tougher when we add a few thousand. I need to get ready. (I'm planning to kill requests that take too long to complete.) I appreciate all the help I've received here in my attempt to track down this problem. --kyler
participants (1)
-
Kyler B. Laird