[Zope] zope timeouts
Michele Marcionelli
michele.marcionelli at math.ethz.ch
Wed Jul 28 06:20:02 EDT 2010
Hello Peter
- no the backtrace aren't repeatable and
- no I saw the stat64 operation in 38 out of 767 tests and
- no I'm working locally (not on some network partition).
Can you maybe tell what is the meaning of this output that I get just after I start gdb?
The output looks like this
warning: .dynamic section for "/lib/lib***.so.*" is not at the expected address
warning: difference appears to be caused by prelink, adjusting expectations
and the most recurrent "warnings" (in the 767 tests) are:
warning: .dynamic section for "/usr/lib/liblber-2.3.so.0" is not at the expected address
=> tot 541 times
warning: .dynamic section for "/usr/lib/libsasl2.so.2" is not at the expected address
=> tot 483 times
warning: .dynamic section for "/usr/lib/libjpeg.so.62" is not at the expected address
=> tot 445 times
warning: .dynamic section for "/lib/libcrypt.so.1" is not at the expected address
=> tot 385 times
warning: .dynamic section for "/lib/libnsl.so.1" is not at the expected address
=> tot 358 times
warning: .dynamic section for "/lib/libcrypto.so.6" is not at the expected address
=> tot 330 times
warning: .dynamic section for "/lib/libsepol.so.1" is not at the expected address
=> tot 273 times
warning: .dynamic section for "/lib/libselinux.so.1" is not at the expected address
=> tot 239 times
warning: .dynamic section for "/usr/lib/libz.so.1" is not at the expected address
=> tot 225 times
warning: .dynamic section for "/usr/lib/libkrb5.so.3" is not at the expected address
=> tot 100 times
Thank you anyway,
Michele
On Jul 20, 2010, at 19:59 , Peter Sabaini wrote:
> Hm, just some quick ideas -- thread 3 is busy doing a stat64() , should
> usually done quickly. And thread 4 is doing some kind of string
> operation (comparison), which also shouldn't take long except if you
> have a huge dataset. Is this backtrace repeatable, ie. does it hang in
> the same syscalls?
>
> Are you maybe running on NFS or something like that?
>
> hth,
> peter.
>
>
> On Tue, 2010-07-20 at 17:00 +0200, Michele Marcionelli wrote:
>> Ciao Jens
>>
>> thank you for your answer. Now since I'm new with "gdb" I have some questions.
>> I did following. I created a file called "gdb_batch" with following content:
>>
>> info threads
>> thread 1
>> bt
>> thread 2
>> bt
>> thread 3
>> bt
>> thread 4
>> bt
>> thread 5
>> bt
>>
>> and run the "gdb" command like this
>>
>> gdb python $(cat Z2.pid) --batch -x gdb_batch > gdb_bt.log 2>&1
>>
>> Can you maybe help me to "decode" the output (see attachment)?
>>
>> For instance: what does this mean?
>>
>> (gdb) info threads
>> 5 Thread 0xb712bb90 (LWP 7532) 0x005fd410 in __kernel_vsyscall ()
>> 4 Thread 0xb672ab90 (LWP 7533) lookdict_string (mp=0xb78c4714, key=0xb7c19db8, hash=1293002269) at Objects/dictobject.c:331
>> 3 Thread 0xb5d29b90 (LWP 7534) 0x005fd410 in __kernel_vsyscall ()
>> 2 Thread 0xb5328b90 (LWP 7535) 0x005fd410 in __kernel_vsyscall ()
>> * 1 Thread 0xb7fc16c0 (LWP 7174) 0x005fd410 in __kernel_vsyscall ()
>>
>> What does mean the "*"? And what is the "difference" between the threads 1,2,3,5 (with the __kernel_vsyscall) and the 4?
>>
>> What should I look for?
>>
>> Thank you,
>> Michele
>>
>>
>> On Jul 20, 2010, at 14:54 , Jens Vagelpohl wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi Michele,
>>>
>>> I would consider the coincience with strace "reviving" the process a red
>>> herring. When strace runs it does not influence the process you are
>>> watching with it, except for telling you what OS-level calls are made.
>>>
>>> The most common reason for such timeouts are problems with software that
>>> tries to use other network resources during the running request to talk
>>> to services like RDBMS systems, LDAP, etc. If these background
>>> connections hang for some reason you'll end up with a hanging Zope
>>> thread. Enough of them and Zope itself appears to hang.
>>>
>>> When your process hangs you should be able to attach to it using "gdb"
>>> and printing a stack trace for each thread. That may give you a big clue
>>> where the problem is.
>>>
>>> jens
>>>
>>>
>>> On 7/20/10 13:53 , Michele Marcionelli wrote:
>>>> Hello
>>>>
>>>> I'm running an old version of Zope which is required by our corporate design using also an older version of the CMS Silva:
>>>>
>>>> Linux Distribution: Red Hat Enterprise Linux Server release 5.5
>>>> Linux Kernel: 2.6.18-194.8.1.el5PAE
>>>> Zope: 2.8.9.1
>>>> Python: 2.3.6 (with several modules, like: ldap, pil, mysql, pyxml, ...)
>>>> Silva: 1.5.13
>>>>
>>>> My problem ist that my zope server hat timeouts on displaying pages after a while that it has been started (currently this "while" is approx. 15 minutes). The only thing that helps is restarting zope.
>>>>
>>>> By accident I discovered about two month ago, that running an "strace -f" on the zope process, "reactivated" it... so I could live with this hack until yesterday.
>>>>
>>>> Now I'm running "strace" permanently (with all the disadvantages) and zope works "fine" for about 1 to 1.5 hours.
>>>>
>>>> Here is a short recurring output of "strace":
>>>>
>>>>> [pid 13408] <... futex resumed> ) = 0
>>>>> [pid 13406] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13405] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 13408] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 13407] <... futex resumed> ) = 0
>>>>> [pid 13405] <... futex resumed> ) = 1
>>>>> [pid 13408] <... futex resumed> ) = 1
>>>>> [pid 13405] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13408] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13406] <... futex resumed> ) = 0
>>>>> [pid 13406] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13407] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>> [pid 13405] <... futex resumed> ) = 0
>>>>> [pid 13407] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 13405] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13408] <... futex resumed> ) = 0
>>>>> [pid 13407] <... futex resumed> ) = 1
>>>>> [pid 13407] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13405] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
>>>>
>>>>
>>>> I need definitively help... Any idea?
>>>>
>>>> Thx,
>>>> Michele
--
ETH Zürich
Michele Marcionelli
IT Support Gruppe D-MATH
HG G 32.1
Rämistrasse 101
CH–8092 Zürich
michele.marcionelli at math.ethz.ch
http://www.math.ethz.ch/~michele
+41 44 632 6193 Telefon
+41 44 632 1354 Fax
More information about the Zope
mailing list