[Zope] zope timeouts

Michele Marcionelli michele.marcionelli at math.ethz.ch
Wed Jul 28 06:20:02 EDT 2010


Hello Peter

- no the backtrace aren't repeatable and
- no I saw the stat64 operation in 38 out of 767 tests and
- no I'm working locally (not on some network partition).

Can you maybe tell what is the meaning of this output that I get just after I start gdb?

The output looks like this

warning: .dynamic section for "/lib/lib***.so.*" is not at the expected address
warning: difference appears to be caused by prelink, adjusting expectations

and the most recurrent "warnings" (in the 767 tests) are:

warning: .dynamic section for "/usr/lib/liblber-2.3.so.0" is not at the expected address
=> tot 541 times
warning: .dynamic section for "/usr/lib/libsasl2.so.2" is not at the expected address
=> tot 483 times
warning: .dynamic section for "/usr/lib/libjpeg.so.62" is not at the expected address
=> tot 445 times
warning: .dynamic section for "/lib/libcrypt.so.1" is not at the expected address
=> tot 385 times
warning: .dynamic section for "/lib/libnsl.so.1" is not at the expected address
=> tot 358 times
warning: .dynamic section for "/lib/libcrypto.so.6" is not at the expected address
=> tot 330 times
warning: .dynamic section for "/lib/libsepol.so.1" is not at the expected address
=> tot 273 times
warning: .dynamic section for "/lib/libselinux.so.1" is not at the expected address
=> tot 239 times
warning: .dynamic section for "/usr/lib/libz.so.1" is not at the expected address
=> tot 225 times
warning: .dynamic section for "/usr/lib/libkrb5.so.3" is not at the expected address
=> tot 100 times

Thank you anyway,
Michele


On Jul 20, 2010, at 19:59 , Peter Sabaini wrote:

> Hm, just some quick ideas -- thread 3 is busy doing a stat64() , should
> usually done quickly. And thread 4 is doing some kind of string
> operation (comparison), which also shouldn't take long except if you
> have a huge dataset. Is this backtrace repeatable, ie. does it hang in
> the same syscalls?
> 
> Are you maybe running on NFS or something like that? 
> 
> hth,
> peter.
> 
> 
> On Tue, 2010-07-20 at 17:00 +0200, Michele Marcionelli wrote:
>> Ciao Jens
>> 
>> thank you for your answer. Now since I'm new with "gdb" I have some questions.
>> I did following. I created a file called "gdb_batch" with following content:
>> 
>>   info threads
>>   thread 1
>>   bt
>>   thread 2
>>   bt
>>   thread 3
>>   bt
>>   thread 4
>>   bt
>>   thread 5
>>   bt
>> 
>> and run the "gdb" command like this
>> 
>>   gdb python $(cat Z2.pid) --batch -x gdb_batch  > gdb_bt.log 2>&1
>> 
>> Can you maybe help me to "decode" the output (see attachment)?
>> 
>> For instance: what does this mean?
>> 
>> (gdb) info threads
>> 5 Thread 0xb712bb90 (LWP 7532)  0x005fd410 in __kernel_vsyscall ()
>> 4 Thread 0xb672ab90 (LWP 7533)  lookdict_string (mp=0xb78c4714, key=0xb7c19db8, hash=1293002269) at Objects/dictobject.c:331
>> 3 Thread 0xb5d29b90 (LWP 7534)  0x005fd410 in __kernel_vsyscall ()
>> 2 Thread 0xb5328b90 (LWP 7535)  0x005fd410 in __kernel_vsyscall ()
>> * 1 Thread 0xb7fc16c0 (LWP 7174)  0x005fd410 in __kernel_vsyscall ()
>> 
>> What does mean the "*"? And what is the "difference" between the threads 1,2,3,5 (with the __kernel_vsyscall) and the 4?
>> 
>> What should I look for?
>> 
>> Thank you,
>> Michele
>> 
>> 
>> On Jul 20, 2010, at 14:54 , Jens Vagelpohl wrote:
>> 
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>> 
>>> Hi Michele,
>>> 
>>> I would consider the coincience with strace "reviving" the process a red
>>> herring. When strace runs it does not influence the process you are
>>> watching with it, except for telling you what OS-level calls are made.
>>> 
>>> The most common reason for such timeouts are problems with software that
>>> tries to use other network resources during the running request to talk
>>> to services like RDBMS systems, LDAP, etc. If these background
>>> connections hang for some reason you'll end up with a hanging Zope
>>> thread. Enough of them and Zope itself appears to hang.
>>> 
>>> When your process hangs you should be able to attach to it using "gdb"
>>> and printing a stack trace for each thread. That may give you a big clue
>>> where the problem is.
>>> 
>>> jens
>>> 
>>> 
>>> On 7/20/10 13:53 , Michele Marcionelli wrote:
>>>> Hello
>>>> 
>>>> I'm running an old version of Zope which is required by our corporate design using also an older version of the CMS Silva:
>>>> 
>>>>  Linux Distribution: Red Hat Enterprise Linux Server release 5.5
>>>>  Linux Kernel: 2.6.18-194.8.1.el5PAE
>>>>  Zope: 2.8.9.1
>>>>  Python: 2.3.6 (with several modules, like: ldap, pil, mysql, pyxml, ...)
>>>>  Silva: 1.5.13
>>>> 
>>>> My problem ist that my zope server hat timeouts on displaying pages after a while that it has been started (currently this "while" is approx. 15 minutes). The only thing that helps is restarting zope.
>>>> 
>>>> By accident I discovered about two month ago, that running an "strace -f" on the zope process, "reactivated" it... so I could live with this hack until yesterday.
>>>> 
>>>> Now I'm running "strace" permanently (with all the disadvantages) and zope works "fine" for about 1 to 1.5 hours.
>>>> 
>>>> Here is a short recurring output of "strace":
>>>> 
>>>>> [pid 13408] <... futex resumed> )       = 0
>>>>> [pid 13406] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13405] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 13408] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 13407] <... futex resumed> )       = 0
>>>>> [pid 13405] <... futex resumed> )       = 1
>>>>> [pid 13408] <... futex resumed> )       = 1
>>>>> [pid 13405] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13408] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13406] <... futex resumed> )       = 0
>>>>> [pid 13406] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13407] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1) = 1
>>>>> [pid 13405] <... futex resumed> )       = 0
>>>>> [pid 13407] futex(0xbcc12c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>>>> [pid 13405] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13408] <... futex resumed> )       = 0
>>>>> [pid 13407] <... futex resumed> )       = 1
>>>>> [pid 13407] futex(0xbcc12c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
>>>>> [pid 13405] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
>>>> 
>>>> 
>>>> I need definitively help... Any idea?
>>>> 
>>>> Thx,
>>>> Michele


-- 
ETH Zürich
Michele Marcionelli
IT Support Gruppe D-MATH
HG G 32.1
Rämistrasse 101
CH–8092 Zürich

michele.marcionelli at math.ethz.ch
http://www.math.ethz.ch/~michele

+41 44 632 6193 Telefon
+41 44 632 1354 Fax



More information about the Zope mailing list