[Zope-dev] Runaway processes
Stephan Richter
srichter at cosmos.phy.tufts.edu
Wed Dec 5 17:47:53 EST 2007
Hi everyone,
I have a problem and I am hoping that it has been solved already by someone or
that I will get some input on at least.I apologize for the lengthy E-mail in
advance, but I wanted to provide a detailed discussion as a starting point.
Zope is designed to have very short-lived transactions. If transactions are
long-living all sorts of problems arise, most notably:
1. We occupy one thread for a long time.
2. The chance of conflict errors increases.
Problem 1 can be addressed by increasing the number of allowed threads or to
simply add more Zope servers. But his has clearly its limits and is really
just a work-around. Another way to solve the problem is to identify
long-running operations and calling them asynchronous. Many of us have
implemented solutions for this, one of which is lovely.remotetask.
Problem number 2 can only be addressed by identifying the long-running tasks
beforehand and move them into an async call, again via lovely.remotetask for
example.
But what happens, if a something unexpected happens and we have an
unanticipated long-runnning process? The worst case being something runs
forever. Then whenever this problem occurs, one thread will be locked
forever, and we can have a total system lockdown in no-time.
So how can this be solved? Effectively, from within Zope we cannot do
anything, because (a) Zope makes no assumption about running in a thread, and
(b) the application is stuck and won't have a hook to get unstuck.
So we have to solve the problem from outside. Currently, Zope is commonly run
from an application thread. At least both WSGI servers that we commonly use,
twisted and zserver, are implemented this way. This means that by some
criterion, probably some timeout, the thread should be killed.
But hold on! In Python threads cannot be killed. :-( I have done some research
and found issue 221115 [1], which discusses the shortcoming of not being able
to kill a thread. The discussion ended in making a feature request in PEP 42
[2] which has not been implemented as far as I can tell. So I googled some
more to find possible implementations. Here are two distinctively different
solutions (others I have found are either obviously trivial and will not
work, or are derivatives of these two):
1. A Python-only solution using sys.settrace [3].
Besides making everything very slow, sys.settrace() is only called when a
new byte code instruction is executed. So in case a low-level call hangs up
the process, then the trace intercept will never be called.
2. Use an exception to intercept execution on the C-level [4].
This looked very promising, until I read the following comment on the page:
The exception will be raised only when executing python bytecode. If your
thread calls a native/built-in blocking function, the exception will be
raised only when execution returns to the python code.
So my conclusion is that Python threads cannot be unconditionally killed. BTW,
if a low-level call is blocking, then all Python threads are blocked. From
the Python `thread` library documentation[5]:
Not all built-in functions that may block waiting for I/O allow other
threads to run. (The most popular ones (time.sleep(), file.read(),
select.select()) work as expected.)
In all fairness, though, those are very rare occurrences. Most libraries are
non-blocking and the above solutions would be just fine.
But in my case, I really need to find a way to kill a Zope execution
environment when a C call hangs. So what other choices do we have?
On Unix-like systems, we can use `os.fork()`. The advantage of this approach
is that I can use OS system calls to kill the process. However, ZODB database
storages cannot be shared between processes. Nikolay Kim has done some
preliminary experiments and found that `db.open()` locks the system (for
both, `FileStorage` and `ZeoClientStorage`). I have not verified these
results or tried to figure out why it is hanging, but I can see the problem
for `FileStorage`.
Are there any known side-effects on what happens, if I fork after the
connection has been made? Since I am using the original process merely as a
control, I guess I should be fine. Of course, the interesting question is:
what happens to the ZODB connection, not to mention to the DB, if it is in
the middle of writing? I guess the safest solution would be to fork within
the constraint of the transaction. Any comments will be very much
appreciated.
Once we decide on the forking approach, we have to solve the issue for Windows
of course too. My googling did not turn out immediately successful, but I
think if we use Windows' native threads they will provide us with the
necessary API, since I can exit it at any time.
.. [1]: http://bugs.python.org/issue221115
.. [2]: http://www.python.org/dev/peps/pep-0042/
.. [3]:
http://www.velocityreviews.com/forums/t330554-kill-a-thread-in-python.html
.. [4]: http://sebulba.wikispaces.com/recipe+thread2
.. [5]: http://docs.python.org/lib/module-thread.html
Regards,
Stephan
--
Stephan Richter
CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student)
Web2k - Web Software Design, Development and Training
More information about the Zope-Dev
mailing list