Thanks for the ideas on this (individual comments at end). I've localized the problem a bit, but I still don't understand it -- here's hoping this rings a bell with somebody. :-D I've tracked the problem down to a "manage_upload()" call from a class extending "Image" (i.e. I've inherited "manage_upload" from the OFS.Image.File class in Zope). It happens on the very first call, so this must be a deterministic bug and not a resource-limit or other cumulative problem. (I just tracked this down with breakpoints in my product code). I've applied the patch to Python 2.1.3 to deal with the stack size problem on FreeBSD. I've eliminated ZEO, so I'm now using regular filestorage on the server. So there are fewer unknowns now. The call to manage_upload is given a Python file object (I've verified that it has valid values for f.name, f.mode, and f.fileno()). The file in question exists on the filesystem (in a subdirectory under the product directory): /usr/local/zope/instance/Products/Narya/www/Emoticon -rw-rw-rw- 1 nobody nobody 496 Apr 7 03:18 em_angry.gif What's more, I had (somewhat inefficiently) read in this entire file's contents using a regular f.read() method without causing any errors, right before making the call to manage_upload. Just for grins, I added f.seek(0) after that to make sure it was positioned at the beginning of the file. No joy, though. (I have tried variations on the file permissions and ownership -- this seems like the most "wide-open" case, in an effort to rule-out permission problems). This works just fine on my development server, but fails on the production server as I described (capsule below). I'm not really too clear on how manage_upload() works -- what sort of things does it have to be able to do? Am I correct in feeding it the file object? (and if not, why did it work before, I wonder?). The story so far... Terry Hancock (I) wrote:
The problem is that, although the product shows up in the product control panel okay, attempting to add the main object from the product into a folder causes the server to restart (some of the lesser objects defined by my product will add without problems).
To make matters worse, it does this without any kind of explanation: no traceback, no log messages [...]
There are, of course, lots of little differences: * I'm running Debian Linux and they are running FreeBSD (I think), though both are Intel architecture and the packages are installed from source. (I'm not using the Debian Zope package, but one downloaded from www.zope.org).
* I run Zope as a special user, while they have it starting as "root" (which means it should run as "nobody" IIRC).
* There are some products in their Zope install that aren't in mine -- a hot fix, and some other, apparently unrelated things.
Thank you very much for the recommendations so far... Jaroslav Lukesh wrote:
Is your machine health OK? Is data on the disk drive OK? Is your bus system OK (without hazards)? Do you run memtest86 (www.memtest86.com) and cpuburn test from Robert Redelmeier (search www.freshmeat.net) for 24 hours (2x BurnBX 2xBurnMMX 2xBurnP6 - depends on your CPU) without error? Did you compile kernels in 10 parallel tasks continuously for 24h without binary difference?
Ack! I think those would violate my usage agreement! This is someone else's computer, and a production zope server. Anyway, the fact that the error is so deterministic I think rules out hardware issues (which are generally unpredictable). Charlie Reiman wrote:
With all due respect, these machines don't sound nearly identical at all. Having said that, I can provide a little help.
Yeah, well, we try. I just meant I'm running the same version of Zope and Python, so it ought not to be a version compatibility problem. I actually tried installing FreeBSD on my development server, but I'm so much more familiar with Debian (especially the install), so I stuck with that. I emphasize this, because with my product being brand new, it's obviously my code that's most suspect! ;-D
The mysterious restarting is from the -Z option in the start script. Disable the debugging option (-D) and enable the watchdog (-Z watchdog.pid) on your development server. You will now have a watchdog zope monitoring and restarting the actual working zope (when it dies, of course). Check the source in z2.py for all the startup options.
Thank you! However, the funny thing is, z2.py *isn't* being called with the -Z flag. But I'll ask the folks who set it up about that. ;-D
My suspicion is that you need to look into permissions. Your product might be doing something that it can't do when run as nobody.
This still seems the most suspicious. However, haven't I proven that the program can access the file? After all, the permissions are now "wide open", the file is owned by "nobody" and I'm able to read in the data with a regular f.read() operation, so it *can't* be a permissions problem, can it? I tried starting up the production server as a regular user, but it wouldn't run -- I didn't try too hard -- I suspect they might've tried to block this sort of thing for security reasons (and I don't really want to run it that way -- I'm just trying to track down the problem). Jens Vagelpohl wrote:
there is a known bug in python for (at least) FreeBSD that leads to sudden restarts. it has to do with the stack size for threads being too small. see this message::
Chris McDonough wrote:
If you're on BSD, this is likely a thread stack space issue. I can't find detailed instructions on how to make it better, but by default FreeBSD (as well as apparently Mac OS X) has a stack space of 64K, which is too small for many heavily recursive applications. I'd search the maillists for things like "stack size" "stack space", "bsd stack" etc.
"Matthew T. Kromer" wrote:
Apologies for the attachment, but this is a tiny patch you can apply to Python 2.1.3 to double the stack size for threads up to 128K. Name: pthread.patch pthread.patch Type: Plain Text (text/plain) Encoding: 7bit
Did it. Even though I don't think this was a problem (it certainly did not fix the crashing), it's probably a good precautionary measure anyway. Thanks a lot for the patch and the information about it. I'm pretty much a complete newbie on BSD systems. Still in the dark ... Terry -- ------------------------------------------------------ Terry Hancock hancock@anansispaceworks.com Anansi Spaceworks http://www.anansispaceworks.com P.O. Box 60583 Pasadena, CA 91116-6583 ------------------------------------------------------