Refresh causes spinning Zope (bug with my Product?)
I am runnning Zope-2.7.0/Python-2.3.3/ZEO on Linux and have been having problems with refreshing a Product that I wrote in-house. Upon refreshing, Zope stops responding, and Apache/FastCGI eventually times out waiting for a response. This was also a problem with Zope-2.6.4/ZEO but not Zope-2.4.3/ZEO; but the machines and custom-product code also changed between 2.4.3 and 2.6.4. Following the instructions at: http://www.zope.org/Members/4am/debugspinningzope I was able to learn that the Zope threads look like: (gdb) info threads 6 Thread 40965 (LWP 29363) 0x080d1e77 in visit_decref (op=0x503c1b54, data=0x0) at Modules/gcmodule.c:225 5 Thread 32772 (LWP 29362) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 4 Thread 24579 (LWP 29361) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 3 Thread 16386 (LWP 29360) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 2 Thread 16385 (LWP 29358) 0x420d224b in poll () from /lib/i686/libc.so.6 1 Thread 8192 (LWP 29312) 0x420d3b2e in select () from /lib/i686/libc.so.6 The traceback is at the bottom of this email. Note the mention of the SNPSet Product. This is the product that we wrote and that was refreshed. I was suspicious of the mention of setstate, so I removed the Product's __setstate__ method, but this did not help. Any advice appreciated. Traceback follows: File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__ response=response) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZPublisher/Publish.py", line 372, in publish_module environ, debug, request, response) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZPublisher/Publish.py", line 163, in publish_module_standard response = publish(request, module_name, after_list, debug=debug) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZPublisher/Publish.py", line 100, in publish request, bind=1) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZPublisher/mapply.py", line 88, in mapply if debug is not None: return debug(object,args,context) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZPublisher/Publish.py", line 40, in call_object result=apply(object,args) # Type s<cr> to step into published object. File "/global/nfs-zope/data/Zope-2.7.0/lib/python/OFS/DTMLMethod.py", line 130, in __call__ r=apply(HTML.__call__, (self, client, REQUEST), kw) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/DocumentTemplate/DT_String.py", line 474, in __call__ try: result = render_blocks(self._v_blocks, md) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/OFS/DTMLMethod.py", line 123, in __call__ r=apply(HTML.__call__, (self, client, REQUEST), kw) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/DocumentTemplate/DT_String.py", line 474, in __call__ try: result = render_blocks(self._v_blocks, md) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/DocumentTemplate/DT_In.py", line 626, in renderwob sequence=expr(md) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/DocumentTemplate/DT_Util.py", line 201, in eval return eval(code, d) File "<string>", line 1, in <expression> File "/usr/local/zopeinstancehome/Products/SNPSet/SNPSet.py", line 260, in global_gene_list if gene.prettybase_available(): File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZODB/Connection.py", line 564, in setstate self._set_ghost_state(obj, p) File "/global/nfs-zope/data/Zope-2.7.0/lib/python/ZODB/Connection.py", line 603, in _set_ghost_state state = unpickler.load() File "<string>", line 1, in ?
John Ziniti wrote at 2004-3-31 11:12 -0500:
I am runnning Zope-2.7.0/Python-2.3.3/ZEO on Linux and have been having problems with refreshing a Product that I wrote in-house. Upon refreshing, Zope stops responding, and Apache/FastCGI eventually times out waiting for a response. This was also a problem with Zope-2.6.4/ZEO but not Zope-2.4.3/ZEO; but the machines and custom-product code also changed between 2.4.3 and 2.6.4.
Following the instructions at:
http://www.zope.org/Members/4am/debugspinningzope
I was able to learn that the Zope threads look like:
(gdb) info threads 6 Thread 40965 (LWP 29363) 0x080d1e77 in visit_decref (op=0x503c1b54, data=0x0) at Modules/gcmodule.c:225 5 Thread 32772 (LWP 29362) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 4 Thread 24579 (LWP 29361) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 3 Thread 16386 (LWP 29360) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 2 Thread 16385 (LWP 29358) 0x420d224b in poll () from /lib/i686/libc.so.6 1 Thread 8192 (LWP 29312) 0x420d3b2e in select () from /lib/i686/libc.so.6
We do not see enough to determine the cause of the spinning. When the situation above remains (i.e. Thread 6 is running while all others are sleeping) when you let your program run (and later stop) again, then this would indicate an infinite loop in the cyclic garbage collector. We would then need a closer look why this happens. -- Dieter
Dieter Maurer wrote:
(gdb) info threads 6 Thread 40965 (LWP 29363) 0x080d1e77 in visit_decref (op=0x503c1b54, data=0x0) at Modules/gcmodule.c:225 5 Thread 32772 (LWP 29362) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 4 Thread 24579 (LWP 29361) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 3 Thread 16386 (LWP 29360) 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 2 Thread 16385 (LWP 29358) 0x420d224b in poll () from /lib/i686/libc.so.6 1 Thread 8192 (LWP 29312) 0x420d3b2e in select () from /lib/i686/libc.so.6
We do not see enough to determine the cause of the spinning.
When the situation above remains (i.e. Thread 6 is running while all others are sleeping) when you let your program run (and later stop) again, then this would indicate an infinite loop in the cyclic garbage collector. We would then need a closer look why this happens.
I see. Thanks, Dieter. As I've had more time to look into this problem, it is apparent that Zope is *not* spining forever. It is just taking several minutes to recover from the first access to instances of the refreshed product. Repeating the "info threads" gives the same results for the several minutes that Zope is unresponsive. So I feel like I am doing something in my Product that is causing the garbage collector a whole lot of trouble when the Product is refreshed. Are there any guidelines for "stupid things you should never do in a Product"? TIA, JZ
John Ziniti wrote at 2004-4-1 15:22 -0500:
... As I've had more time to look into this problem, it is apparent that Zope is *not* spining forever. It is just taking several minutes to recover from the first access to instances of the refreshed product.
A refresh causes the ZODB cache to be abandoned. ZODB's standard code just abandons the cache (without any cleanup). This results in a huge cyclic data structure the cyclic garbage collector needs to recycle. I have a patched "ZODB.Connection.Connection._resetCache" which explicitely calls "orig_cache.clear()". This breaks the cycles and allows the RC based (and not the cyclic) garbage collector to clean up. I am not sure whether this speeds things up. It definitely moves work from the cyclic GC to the RC based GC which might by more efficient. Maybe, you try it and report back. I have been unable to convince Jeremy that the "clear" is useful. Maybe, your data points can make this issue clearer.
... So I feel like I am doing something in my Product that is causing the garbage collector a whole lot of trouble when the Product is refreshed.
Are there any guidelines for "stupid things you should never do in a Product"?
Do you observe excessive memory consumption? This could indicate that the objects created by your product are too big (and therefore take excessive time to be recycled). Do you have an especially large ZODB cache? Recycling a huge ZODB cache is more expensive than recycling a small one. -- Dieter
participants (2)
-
Dieter Maurer -
John Ziniti