Hm, just some quick ideas -- thread 3 is busy doing a stat64() , should usually done quickly. And thread 4 is doing some kind of string operation (comparison), which also shouldn't take long except if you have a huge dataset. Is this backtrace repeatable, ie. does it hang in the same syscalls? Are you maybe running on NFS or something like that? hth, peter. On Tue, 2010-07-20 at 17:00 +0200, Michele Marcionelli wrote:
Ciao Jens
thank you for your answer. Now since I'm new with "gdb" I have some questions. I did following. I created a file called "gdb_batch" with following content:
info threads thread 1 bt thread 2 bt thread 3 bt thread 4 bt thread 5 bt
and run the "gdb" command like this
gdb python $(cat Z2.pid) --batch -x gdb_batch > gdb_bt.log 2>&1
Can you maybe help me to "decode" the output (see attachment)?
For instance: what does this mean?
(gdb) info threads 5 Thread 0xb712bb90 (LWP 7532) 0x005fd410 in __kernel_vsyscall () 4 Thread 0xb672ab90 (LWP 7533) lookdict_string (mp=0xb78c4714, key=0xb7c19db8, hash=1293002269) at Objects/dictobject.c:331 3 Thread 0xb5d29b90 (LWP 7534) 0x005fd410 in __kernel_vsyscall () 2 Thread 0xb5328b90 (LWP 7535) 0x005fd410 in __kernel_vsyscall () * 1 Thread 0xb7fc16c0 (LWP 7174) 0x005fd410 in __kernel_vsyscall ()
What does mean the "*"? And what is the "difference" between the threads 1,2,3,5 (with the __kernel_vsyscall) and the 4?
What should I look for?
Thank you, Michele
On Jul 20, 2010, at 14:54 , Jens Vagelpohl wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Michele,
I would consider the coincience with strace "reviving" the process a red herring. When strace runs it does not influence the process you are watching with it, except for telling you what OS-level calls are made.
The most common reason for such timeouts are problems with software that tries to use other network resources during the running request to talk to services like RDBMS systems, LDAP, etc. If these background connections hang for some reason you'll end up with a hanging Zope thread. Enough of them and Zope itself appears to hang.
When your process hangs you should be able to attach to it using "gdb" and printing a stack trace for each thread. That may give you a big clue where the problem is.
jens
On 7/20/10 13:53 , Michele Marcionelli wrote:
Hello
I'm running an old version of Zope which is required by our corporate design using also an older version of the CMS Silva:
Linux Distribution: Red Hat Enterprise Linux Server release 5.5 Linux Kernel: 2.6.18-194.8.1.el5PAE Zope: 2.8.9.1 Python: 2.3.6 (with several modules, like: ldap, pil, mysql, pyxml, ...) Silva: 1.5.13
My problem ist that my zope server hat timeouts on displaying pages after a while that it has been started (currently this "while" is approx. 15 minutes). The only thing that helps is restarting zope.
By accident I discovered about two month ago, that running an "strace -f" on the zope process, "reactivated" it... so I could live with this hack until yesterday.
Now I'm running "strace" permanently (with all the disadvantages) and zope works "fine" for about 1 to 1.5 hours.
Here is a short recurring output of "strace":
[pid 13408] <... futex resumed> ) = 0 [pid 13406] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13405] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 13408] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 13407] <... futex resumed> ) = 0 [pid 13405] <... futex resumed> ) = 1 [pid 13408] <... futex resumed> ) = 1 [pid 13405] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13408] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13406] <... futex resumed> ) = 0 [pid 13406] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13407] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 13405] <... futex resumed> ) = 0 [pid 13407] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 13405] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13408] <... futex resumed> ) = 0 [pid 13407] <... futex resumed> ) = 1 [pid 13407] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 13405] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
I need definitively help... Any idea?
Thx, Michele
_______________________________________________ Zope maillist - Zope@zope.org https://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope-dev )