Problems with restarts, memory usage, DB connections, FastCGI ?? Help!
Hello, I'm having some problems with my main Zope installation that I'm hoping someone could shed some light on. My nowledge of Zope internals is too limited for me to figure it out by myself right now :) I have a Zope 2.5.0 (With various products) , running on a custom (But default build) Python 2.1.3 on a RedHat 7.3 ... The machine has a 1GB Ram. I use Apache+FastCGI+Squid. The symptoms of the problem are as such: Every once in a while, zope stops responding for a few minutes ... Sometimes it comes back on it's own, sometimes it takes too long and I restart it manually. So far I've been unable to track down WHY. One thing I finally noticed, is that when Zope seems "hung", if I restart Apache, the number of "httpd" processes starts normal, but then keep growing steadily. Once I restart zope, things settle down. So it would seem that the Apache<->Zope connection is suffering of "blocking" one some level. So now I'm wondering why this locking might happen. First there's these 4 database connections. What happens if these stay "full" for too long ? Could THAT be it ? I also have an application inside Zope that has a user upload LARGE images (severa megs at least , up to 12 or 13 megs) through Zope. I f I understand things correctly, the entire image is loaded into memory before being written to disk. Does this activity lock a database ? Could heavy usage of this process be growing memory usage beyond anything reasonable ? One thing that also happened just once was that the entire server ended up running out of memory! Last week, I got kernel messages about the processes being out of memory. I couldn't even log-in to the console during this time. After a while though the kernel was good enough to kill the processes, but I had hit ctrl-alt-del, and then it went on to reboot before I had time to look at anything :( This site takes millions of hits per week (Squid took 3.75M last week, with a 40% cache hit ratio). Should I look at using more than 4 DB connections ? I might also be open to using an entirely different setup, such as Apache+ZServer with mod_rewrite and mod_proxy ... This setup would be somewhat simpler and lighter ... would it also be more reliable ? Any insight, opinions, coments, or tips would be greatly appreciated ! Thanks, Jean-François Doyon Internet Service Development and Systems Support GeoAccess Division Canada Center for Remote Sensing Natural Resources Canada http://atlas.gc.ca Phone: (613) 992-4902 Fax: (613) 947-2410
So far I've been unable to track down WHY.
Uhh....
One thing that also happened just once was that the entire server ended up running out of memory!
Sounds like you tracked it down just fine. You are clearly exhasting your physical memory, the system is swapping like crazy, then it may eventually slog through the piggie bits to resume normal operation. You need more more memory and more threads. There are real limits as to what any system can do reasonably. It sounds like just bumping up the number of DB connections won't do anything for you. Arguably, the right thing for you to do is _decrease_ the number of available connections until you get more hardware (assuming the DB connections are used exclusively to upload files). Your system clearly can't maintain the load you are asking it to handle. It would be better to deny upload service to some folks than let the entire system collapse. Look into buying more memory and going with ZEO to split up the load. Read the man page for ps (assuming you're on unix) and top so you can track the amount of memory getting sucked down. Consider an emergency stop-gap solution where only one user may upload a file at a time. Increasing the number of theads (but not connections) may help. Some background here on why this is so painful: Zope starts with N theads (7, I think) for handling requests. A request is handled by a single thread and that thread is locked up until the request terminates. Python has no facility to preemptively stop thread and Zope never times out requests. As a result, it only takes a few long requests to completely lock up a zope server. If you consider that you could have 4 users uploading large files over slow links, you can see how you are in deep trouble even before considering the memory issue. So start by upping the number of threads (-t 15 maybe?) in your start script. On top of that, you are uploading large files which as you already noted zope doesn't handle really well. This topic comes up a lot here. Zope stores objects in memory. These file objects are going to be huge and rapidly consume all your physical memory. So in addition to starving your zope's thread count, you are now swapping like crazy. If I were you, I'd look into doing the file uploads out-of-band (ie, around zope) or buying lots and lots of RAM.
Jean-Francois.Doyon@CCRS.NRCan.gc.ca writes:
... I have a Zope 2.5.0 (With various products) , running on a custom (But default build) Python 2.1.3 on a RedHat 7.3 ... The machine has a 1GB Ram. I use Apache+FastCGI+Squid.
The symptoms of the problem are as such:
Every once in a while, zope stops responding for a few minutes ... Sometimes it comes back on it's own, sometimes it takes too long and I restart it manually. We have seen similar behaviour (Zope 2.5.1, Solaris 2.5, Apache+FastCGI) as long as we used FastCGI.
We switched from FastCGI to "mod_rewrite" with "[P]" rewrite rules and the problem disappeared. Dieter
The symptoms of the problem are as such:
Every once in a while, zope stops responding for a few minutes ... Sometimes it comes back on it's own, sometimes it takes too long and I restart it manually. We have seen similar behaviour (Zope 2.5.1, Solaris 2.5, Apache+FastCGI) as long as we used FastCGI.
We switched from FastCGI to "mod_rewrite" with "[P]" rewrite rules and the problem disappeared.
The placed I used to work saw this all the time using PCGI and mod_pcgi2. Usually happened on hosts with a reasonable amount of traffic. We managed to track down a few instances of the hangs to large file uploads, a few others to just bad programming in our products, and atleast 1 instance to broken threading on FreeBSD 4.3, but that was fixed in later releases. Never completely solved all the hangs though. -- Jamie Heilman http://audible.transient.net/~jamie/ "...thats the metaphorical equivalent of flopping your wedding tackle into a lion's mouth and flicking his lovespuds with a wet towel, pure insanity..." -Rimmer
participants (4)
-
Charlie Reiman -
Dieter Maurer -
Jamie Heilman -
Jean-Francois.Doyon@CCRS.NRCan.gc.ca