Hi everyone, I have a problem and I am hoping that it has been solved already by someone or that I will get some input on at least.I apologize for the lengthy E-mail in advance, but I wanted to provide a detailed discussion as a starting point. Zope is designed to have very short-lived transactions. If transactions are long-living all sorts of problems arise, most notably: 1. We occupy one thread for a long time. 2. The chance of conflict errors increases. Problem 1 can be addressed by increasing the number of allowed threads or to simply add more Zope servers. But his has clearly its limits and is really just a work-around. Another way to solve the problem is to identify long-running operations and calling them asynchronous. Many of us have implemented solutions for this, one of which is lovely.remotetask. Problem number 2 can only be addressed by identifying the long-running tasks beforehand and move them into an async call, again via lovely.remotetask for example. But what happens, if a something unexpected happens and we have an unanticipated long-runnning process? The worst case being something runs forever. Then whenever this problem occurs, one thread will be locked forever, and we can have a total system lockdown in no-time. So how can this be solved? Effectively, from within Zope we cannot do anything, because (a) Zope makes no assumption about running in a thread, and (b) the application is stuck and won't have a hook to get unstuck. So we have to solve the problem from outside. Currently, Zope is commonly run from an application thread. At least both WSGI servers that we commonly use, twisted and zserver, are implemented this way. This means that by some criterion, probably some timeout, the thread should be killed. But hold on! In Python threads cannot be killed. :-( I have done some research and found issue 221115 [1], which discusses the shortcoming of not being able to kill a thread. The discussion ended in making a feature request in PEP 42 [2] which has not been implemented as far as I can tell. So I googled some more to find possible implementations. Here are two distinctively different solutions (others I have found are either obviously trivial and will not work, or are derivatives of these two): 1. A Python-only solution using sys.settrace [3]. Besides making everything very slow, sys.settrace() is only called when a new byte code instruction is executed. So in case a low-level call hangs up the process, then the trace intercept will never be called. 2. Use an exception to intercept execution on the C-level [4]. This looked very promising, until I read the following comment on the page: The exception will be raised only when executing python bytecode. If your thread calls a native/built-in blocking function, the exception will be raised only when execution returns to the python code. So my conclusion is that Python threads cannot be unconditionally killed. BTW, if a low-level call is blocking, then all Python threads are blocked. From the Python `thread` library documentation[5]: Not all built-in functions that may block waiting for I/O allow other threads to run. (The most popular ones (time.sleep(), file.read(), select.select()) work as expected.) In all fairness, though, those are very rare occurrences. Most libraries are non-blocking and the above solutions would be just fine. But in my case, I really need to find a way to kill a Zope execution environment when a C call hangs. So what other choices do we have? On Unix-like systems, we can use `os.fork()`. The advantage of this approach is that I can use OS system calls to kill the process. However, ZODB database storages cannot be shared between processes. Nikolay Kim has done some preliminary experiments and found that `db.open()` locks the system (for both, `FileStorage` and `ZeoClientStorage`). I have not verified these results or tried to figure out why it is hanging, but I can see the problem for `FileStorage`. Are there any known side-effects on what happens, if I fork after the connection has been made? Since I am using the original process merely as a control, I guess I should be fine. Of course, the interesting question is: what happens to the ZODB connection, not to mention to the DB, if it is in the middle of writing? I guess the safest solution would be to fork within the constraint of the transaction. Any comments will be very much appreciated. Once we decide on the forking approach, we have to solve the issue for Windows of course too. My googling did not turn out immediately successful, but I think if we use Windows' native threads they will provide us with the necessary API, since I can exit it at any time. .. [1]: http://bugs.python.org/issue221115 .. [2]: http://www.python.org/dev/peps/pep-0042/ .. [3]: http://www.velocityreviews.com/forums/t330554-kill-a-thread-in-python.html .. [4]: http://sebulba.wikispaces.com/recipe+thread2 .. [5]: http://docs.python.org/lib/module-thread.html Regards, Stephan -- Stephan Richter CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student) Web2k - Web Software Design, Development and Training