Zope Head (2.8) breaks "refresh"
Playing with Zope Head (as of 2003-12-04) revealed problems with refresh. I see varying behaviour refreshing "CMFCore": * Refresh is not effective (objects continue the have the old behavior) * After a while, "commit" reports an exception "ValueError: Cache values may only be in one cache." http://localhost:10080/Portal/portal_types/Document/manage_editProperties Traceback (innermost last): Module ZPublisher.Publish, line 104, in publish Module Zope.App.startup, line 223, in commit Module ZODB.Transaction, line 254, in commit Module ZODB.Transaction, line 233, in commit Module ZODB.Transaction, line 352, in _commit_objects Module ZODB.Connection, line 286, in commit ValueError: Cache values may only be in one cache. The following transscript shows strange reproducible behaviour which is almost surely related to the above: linux: bin/zopectl debug Starting debugger (the name "app" is bound to the top-level Zope object)
pt=getattr(app,'standard_template.pt') pr=app.unrestrictedTraverse('Control_Panel/Products/PageTemplates/manage_performRefresh') pt.getId() 'standard_template.pt' pr() ## refresh pt.getId() ## apparently not deactivated 'standard_template.pt' pt._p_deactivate() pt.getId() ## obviously a wrong result 'Zope' pt.aq_base.getId() ## may be explainable as "pt" is not refetched Traceback (most recent call last): File "<stdin>", line 1, in ? File "/home/dieter/ZH/ZopeHead/lib/python/ZODB/Connection.py", line 424, in setstate self._reader.setGhostState(obj, p) File "/home/dieter/ZH/ZopeHead/lib/python/ZODB/serialize.py", line 203, in setGhostState obj.__setstate__(state) File "/home/dieter/ZH/ZopeHead/lib/python/Products/PageTemplates/ZopePageTemplate.py", line 301, in __setstate__ ZopePageTemplate.inheritedAttribute('__setstate__')(self, state) AttributeError: 'NoneType' object has no attribute 'inheritedAttribute' pt=getattr(app,'standard_template.pt') ## refetch pt.getId() ## still wrong 'Zope'
Looks as if "refresh" did not flush the ZODB cache, which still delivers objects referencing the old classes. I now know why the above code does not work as I expected: The new refresh no longer uses "minimize" to flush the ZODB caches but updates the "code_timestamp". This causes the cache to be replaced by a new one when the connection is opened for the next time. ATT: replacing the cache without clearing it can lead to huge memory leaks (everything in the old cache is leaked!). I can not yet explain the original observations... -- Dieter
On Sun, 2003-12-07 at 08:15, Dieter Maurer wrote:
ATT: replacing the cache without clearing it can lead to huge memory leaks (everything in the old cache is leaked!).
Without commenting on the rest of the bug report, I should mention that caches and persistent objects all participate in cyclic GC. They will not leak the way pre-2.8 caches did. Jeremy
Jeremy Hylton wrote at 2003-12-7 23:01 -0500:
On Sun, 2003-12-07 at 08:15, Dieter Maurer wrote:
ATT: replacing the cache without clearing it can lead to huge memory leaks (everything in the old cache is leaked!).
Without commenting on the rest of the bug report, I should mention that caches and persistent objects all participate in cyclic GC. They will not leak the way pre-2.8 caches did.
Has the "do not free when a cycle contains an object with destructor" restriction been removed from the cyclic GC? It is not unlikely that the cache references an object with a "__del__". This may keep the cache and all its content. -- Dieter
On Mon, 2003-12-08 at 13:17, Dieter Maurer wrote:
Jeremy Hylton wrote at 2003-12-7 23:01 -0500:
On Sun, 2003-12-07 at 08:15, Dieter Maurer wrote:
ATT: replacing the cache without clearing it can lead to huge memory leaks (everything in the old cache is leaked!).
Without commenting on the rest of the bug report, I should mention that caches and persistent objects all participate in cyclic GC. They will not leak the way pre-2.8 caches did.
Has the "do not free when a cycle contains an object with destructor" restriction been removed from the cyclic GC?
It is not unlikely that the cache references an object with a "__del__". This may keep the cache and all its content.
I don't think it makes a lot of sense to put an __del__ method on a Persistent object. When would you expect it to be called? Every time the object is turned into a ghost? When the object is removed by pack? I can imagine some people would try to migrate non-persistent code to ZODB and have to deal with __del__ then. If I were doing that, I'd make sure to check gc.garbage when I tested and fix any code that caused leaks. Jeremy
Jeremy Hylton wrote at 2003-12-8 15:28 -0500:
On Mon, 2003-12-08 at 13:17, Dieter Maurer wrote:
Jeremy Hylton wrote at 2003-12-7 23:01 -0500:
On Sun, 2003-12-07 at 08:15, Dieter Maurer wrote:
ATT: replacing the cache without clearing it can lead to huge memory leaks (everything in the old cache is leaked!).
Without commenting on the rest of the bug report, I should mention that caches and persistent objects all participate in cyclic GC. They will not leak the way pre-2.8 caches did.
Has the "do not free when a cycle contains an object with destructor" restriction been removed from the cyclic GC?
It is not unlikely that the cache references an object with a "__del__". This may keep the cache and all its content.
I don't think it makes a lot of sense to put an __del__ method on a Persistent object.
I just read in "cPersistence.c:ghostify": /* We remove the reference to the just ghosted object that the ring * holds. Note that the dictionary of oids->objects has an uncounted * reference, so if the ring's reference was the only one, this frees * the ghost object. Note further that the object's dealloc knows to * inform the dictionary that it is going away. */ This means: *all* persistent objects have a special "dealloc" function. Hope, this "dealloc" function plays well with the cyclic garbage collector. -- Dieter
On Mon, 2003-12-15 at 07:34, Dieter Maurer wrote:
I don't think it makes a lot of sense to put an __del__ method on a Persistent object.
I just read in "cPersistence.c:ghostify":
/* We remove the reference to the just ghosted object that the ring * holds. Note that the dictionary of oids->objects has an uncounted * reference, so if the ring's reference was the only one, this frees * the ghost object. Note further that the object's dealloc knows to * inform the dictionary that it is going away. */
This means: *all* persistent objects have a special "dealloc" function. Hope, this "dealloc" function plays well with the cyclic garbage collector.
All Python objects have a "dealloc" function -- whatever is in the tp_dealloc slot. The tp_dealloc function at the C level is completely different from an __del__ method at the Python level. Jeremy
[Jeremy Hylton]
I don't think it makes a lot of sense to put an __del__ method on a Persistent object.
[Dieter Maurer]
I just read in "cPersistence.c:ghostify":
/* We remove the reference to the just ghosted object that the ring * holds. Note that the dictionary of oids->objects has an uncounted * reference, so if the ring's reference was the only one, this frees * the ghost object. Note further that the object's dealloc knows to * inform the dictionary that it is going away. */
This means: *all* persistent objects have a special "dealloc" function.
Most Python objects do (persistent or not), pointed to by the type's tp_dealloc slot, or inherited from a base type's tp_dealloc. Having a tp_dealloc function isn't the same as having a __del__ method, though: tp_dealloc is part of the system implementation, while __del__ is arbitrary user-defined Python code. It's required that the presence or absence of tp_dealloc not make any difference to cyclic gc, and gc doesn't even check for their existence. Since tp_dealloc implementations are "part of the system", they can (and do) impose restrictions on their authors that aren't imposed on authors of Python __del__ methods.
Hope, this "dealloc" function plays well with the cyclic garbage collector.
If a case is found where it doesn't, it's a bug in the persistent tp_dealloc function. It was hard to get cyclic gc to work with the cache, in large part because of the "dictionary of oids->objects [with] uncounted reference[s]" (Python's cyclic gc is based on having truthful refcounts), and we did hit plenty of bugs in the process. We don't know of any that remain. Doesn't mean there aren't any remaining, but does mean a lot of brain cells already got burned trying to ensure that the persistent dealloc functions play nice with cyclic gc.
Dieter Maurer wrote at 2003-12-7 14:15 +0100:
Playing with Zope Head (as of 2003-12-04) revealed problems with refresh.
Turns out to be a bug in "ZODB.Connection.Connection._setDB": "ConnectionObjectReader" used the old cache while "refresh" cause a new cache to be installed later in "_resetCache". Patch attached. -- Dieter
On Mon, 2003-12-08 at 15:36, Dieter Maurer wrote:
Dieter Maurer wrote at 2003-12-7 14:15 +0100:
Playing with Zope Head (as of 2003-12-04) revealed problems with refresh.
Turns out to be a bug in "ZODB.Connection.Connection._setDB":
"ConnectionObjectReader" used the old cache while "refresh" cause a new cache to be installed later in "_resetCache".
Patch attached.
The patch doesn't make sense: --- ZODB/Connection.py~ 2003-11-28 17:44:49.000000000 +0100 +++ ZODB/Connection.py 2003-12-08 21:26:49.000000000 +0100 @@ -158,8 +158,6 @@ - self._reader = ConnectionObjectReader(self, self._cache, - self._db._classFactory) @@ -168,6 +166,8 @@ + self._reader = ConnectionObjectReader(self, self._cache, + self._db._classFactory) Is there any difference between these two execept for whitespace? Jeremy
Jeremy Hylton wrote:
@@ -158,8 +158,6 @@ - self._reader = ConnectionObjectReader(self, self._cache, - self._db._classFactory) @@ -168,6 +166,8 @@ + self._reader = ConnectionObjectReader(self, self._cache, + self._db._classFactory)
Is there any difference between these two execept for whitespace?
Line numbers? HTH, Yuppie
On Mon, 2003-12-08 at 15:56, Yuppie wrote:
Jeremy Hylton wrote:
@@ -158,8 +158,6 @@ - self._reader = ConnectionObjectReader(self, self._cache, - self._db._classFactory) @@ -168,6 +166,8 @@ + self._reader = ConnectionObjectReader(self, self._cache, + self._db._classFactory)
Is there any difference between these two execept for whitespace?
Line numbers?
HTH, Yuppie
Yes. That helps :-). Jeremy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/12/2003, at 12:15 AM, Dieter Maurer wrote:
Playing with Zope Head (as of 2003-12-04) revealed problems with refresh.
This is not just a 2.8 issue - the behavior is in the 2.7 betas (at least up to beta 2 - havn't tested autorefresh with beta 3 yet) as well: http://collector.zope.org/Zope/1010 Manually refreshing works fine, but automatic refreshes cause the ValueErrors. - -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/1mfNAfqZj7rGN0oRAj9PAJ4+EUiHT9uI48ne5fRbO7+Tfa2fSQCdETcl ZF0IU3quFDv6QGlnzRdxSz4= =HIIJ -----END PGP SIGNATURE-----
Stuart Bishop wrote at 2003-12-10 11:24 +1100:
On 08/12/2003, at 12:15 AM, Dieter Maurer wrote:
Playing with Zope Head (as of 2003-12-04) revealed problems with refresh.
This is not just a 2.8 issue - the behavior is in the 2.7 betas (at least up to beta 2 - havn't tested autorefresh with beta 3 yet) as well:
http://collector.zope.org/Zope/1010
Manually refreshing works fine, but automatic refreshes cause the ValueErrors.
Then, try my patch... -- Dieter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/12/2003, at 5:50 AM, Dieter Maurer wrote:
Stuart Bishop wrote at 2003-12-10 11:24 +1100:
On 08/12/2003, at 12:15 AM, Dieter Maurer wrote:
Playing with Zope Head (as of 2003-12-04) revealed problems with refresh.
This is not just a 2.8 issue - the behavior is in the 2.7 betas (at least up to beta 2 - havn't tested autorefresh with beta 3 yet) as well:
http://collector.zope.org/Zope/1010
Manually refreshing works fine, but automatic refreshes cause the ValueErrors.
Then, try my patch...
The patch is incompatible with the 2.7 branch - self._reader and ConnectionObjectReader don't seem to exist in 2.7 at all, so there is nothing to shuffle around. - -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/2OoZAfqZj7rGN0oRAnWhAJoDezgxOKwswrhZEvAWeYrXyLNZ2ACcCmqG TVUNPb7Ts0pndeNvEWgtHP0= =1/Bz -----END PGP SIGNATURE-----
Stuart Bishop wrote at 2003-12-12 09:05 +1100:
This is not just a 2.8 issue - the behavior is in the 2.7 betas (at least up to beta 2 - havn't tested autorefresh with beta 3 yet) as well:
http://collector.zope.org/Zope/1010
Manually refreshing works fine, but automatic refreshes cause the ValueErrors.
Then, try my patch...
The patch is incompatible with the 2.7 branch - self._reader and ConnectionObjectReader don't seem to exist in 2.7 at all, so there is nothing to shuffle around.
Then, it is highly likely that the problems in Zope 2.7 has a different cause. I probably will start working with Zope 2.7 in 2 weeks. And as refresh is an essential functionality for me, I will make it working (should it not work). -- Dieter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 13/12/2003, at 5:05 AM, Dieter Maurer wrote:
Then, it is highly likely that the problems in Zope 2.7 has a different cause.
I probably will start working with Zope 2.7 in 2 weeks. And as refresh is an essential functionality for me, I will make it working (should it not work).
I've just clarified the issue in the collector. We have since found that manual product refresh is working, and it is just the automatic product refresh that gives this error. - -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/2pj2AfqZj7rGN0oRAqltAJwM0oaQN2zV+mnWuRNU2BpxJ7824ACdGRXq PN6CjW0fqAjAiwHQpOj4uKY= =aG8y -----END PGP SIGNATURE-----
On Fri, 2003-12-12 at 23:43, Stuart Bishop wrote:
I've just clarified the issue in the collector. We have since found that manual product refresh is working, and it is just the automatic product refresh that gives this error.
I don't mean to sound glib (really), but refresh has always caused more problems for me than it has solved, so when I see bug reports for refresh I'm always reminded of "Doctor, doctor, it hurts when I do this..." With ZEO, Zope typically restarts in under a second. I just do that and don't bother with refresh. Is there any commanding reason to use refresh other than startup/teardown time? - C
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/12/2003, at 11:20 AM, Chris McDonough wrote:
On Fri, 2003-12-12 at 23:43, Stuart Bishop wrote:
I've just clarified the issue in the collector. We have since found that manual product refresh is working, and it is just the automatic product refresh that gives this error.
I don't mean to sound glib (really), but refresh has always caused more problems for me than it has solved, so when I see bug reports for refresh I'm always reminded of "Doctor, doctor, it hurts when I do this..." With ZEO, Zope typically restarts in under a second. I just do that and don't bother with refresh. Is there any commanding reason to use refresh other than startup/teardown time?
Firstly, I've *never* had problems with refresh (as a user of) until Zope 2.7 (I only use it on my own products, so I must naturally code in a refresh friendly manner). I'm used to developing with autorefresh switched on for my products and it just *works*. If I make a change and don't see the expected behavior, I switch to the product refresh page to see where the syntax error is that caused the refresh to fail. I've occasionally also used manual refresh to reload products on production servers without nuking sessions, but apart from that it is startup/teardown time. This is a problem though because doc/INSTALL.txt tells us to: ./configure make instance Until following the steps in doc/INSTALL.txt fires up a single ZEO server bound to the loopback address and a single ZEO client (with authentication on), the vast majority of developer and production installations will be ZEO-less. This also makes ZEO-less installations the best tested and most stable environment, or at least people will assume so. - -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQE/29SpAfqZj7rGN0oRAjAtAJ9GZh1iPTlUS3pYh4EDfEvrz8ysDQCfY8V5 O2w5LfPybUiEelaTvQep6Qw= =NXRf -----END PGP SIGNATURE-----
On Sat, 2003-12-13 at 22:10, Stuart Bishop wrote:
Firstly, I've *never* had problems with refresh (as a user of) until Zope 2.7 (I only use it on my own products, so I must naturally code in a refresh friendly manner). I'm used to developing with autorefresh switched on for my products and it just *works*. If I make a change and don't see the expected behavior, I switch to the product refresh page to see where the syntax error is that caused the refresh to fail.
I must code in a particularly refresh-unfriendly manner because often the globals I use in my code turn into Nones after a refresh, and it leaves me in a place where I need to try to figure out whether it's my problem or refresh's problem, and the first step in doing that is restarting. So I just cut out the middleman, because it's not much of a bother. I do agree that having autorefresh would be nice, though. Debugging it is just not problem I'm keen on tackling. It's an amazing piece of work by Shane, but it seems to have some limits that I don't quite understand. Either that or it works just fine but I just haven't used it in a while.
I've occasionally also used manual refresh to reload products on production servers without nuking sessions, but apart from that it is startup/teardown time. This is a problem though because doc/INSTALL.txt tells us to: ./configure make instance
Until following the steps in doc/INSTALL.txt fires up a single ZEO server bound to the loopback address and a single ZEO client (with authentication on), the vast majority of developer and production installations will be ZEO-less. This also makes ZEO-less installations the best tested and most stable environment, or at least people will assume so.
Zope still starts in under a second with a very small index and in under 10 seconds with a large one (at least on my main development machines), so even this isn't so terrible. With the zopectl shell, it's even just a matter of arrowing up in a terminal window and choosing "restart". But autorefresh would indeed kick the snot out of this any day if I felt like I could rely on it unconditionally or if the conditions under which it caused me problems were easily predictable. - C
On Sat, 2003-12-13 at 22:10, Stuart Bishop wrote:
Until following the steps in doc/INSTALL.txt fires up a single ZEO server bound to the loopback address and a single ZEO client (with authentication on), the vast majority of developer and production installations will be ZEO-less. This also makes ZEO-less installations the best tested and most stable environment, or at least people will assume so.
Can't argue about the assumptions, nor about how people are likely to start off using Zope; nevertheless, *every* ZC developer routinely uses ZEO, even for sandbox development. None of us would even contemplate *not* using ZEO in production: its "testedness" is pretty well established. Here are a couple of compelling reasons to run with ZEO all the time: - You can't scale Zope across hardware without ZEO, which means you need to assume that you will (or at least may) need it in production; hence, you might as well develop and test your app / site with ZEO. - ZEO lets you get a debug session open on your database without stopping the live site (this can be a lifesaver, particularly in development and site configuration; ever lock yourself out of the site by messing up the configuration of your root user folder?) - ZEO speeds restarts, often dramatically (verifying persistent caches used to be problematic, but ZODB3 3.2 has a fix for the common case). - ZEO makes ZODB-dependent unit tests run faster (another facet of the restart problem). This has been particulary true for testing products installed in INSTANCE_HOME, because the machinery for stitching together the __path__ of Products was tightly coupled to appserver startup. We might be able to fix that in future Zopes. Tres. -- =============================================================== Tres Seaver tseaver@zope.com Zope Corporation "Zope Dealers" http://www.zope.com
Tres Seaver wrote at 2003-12-14 09:35 -0500:
... - ZEO makes ZODB-dependent unit tests run faster (another facet of the restart problem). This has been particulary true for testing products installed in INSTANCE_HOME, because the machinery for stitching together the __path__ of Products was tightly coupled to appserver startup. We might be able to fix that in future Zopes.
We use "import App.FindHomes" to activate the "INSTANCE_HOME" magic. We nevertheless use ZEO but I can not confirm that startup is faster -- even without the cache verification time. It looks like reading Python's module files were dominating the startup process (take 6s on my computer). -- Dieter
Chris McDonough wrote at 2003-12-13 19:20 -0500:
With ZEO, Zope typically restarts in under a second.
Things are much better with you than with us. Here, it takes at least 6 seconds to reload the hundreds of Python modules that make up Zope and it takes minutes to validate the ZEO client cache. We still use ZODB 3.1 and at least there, the cache verification protocol seems quite stupid. We will soon switch to ZODB 3.2 and when cache validation still needs minutes, I will need to look into this... -- Dieter
On Sun, 2003-12-14 at 15:03, Dieter Maurer wrote:
Chris McDonough wrote at 2003-12-13 19:20 -0500:
With ZEO, Zope typically restarts in under a second.
Things are much better with you than with us.
Here, it takes at least 6 seconds to reload the hundreds of Python modules that make up Zope and it takes minutes to validate the ZEO client cache.
OK, I suppose I was exaggerating (but only by about four seconds ;-). For me, with no ZEO, running Zope and Plone HEADs on an Athlon 2100+ under Linux: real 0m5.329s user 0m3.310s sys 0m0.290s With ZEO with no persistent client cache: real 0m5.652s user 0m3.590s sys 0m0.330s With ZEO with a persistent client cache: real 0m4.991s user 0m3.610s sys 0m0.310s The database is very small, so it's hard for me to tell how much time (if any) ZEO is saving me on restarts.
We still use ZODB 3.1 and at least there, the cache verification protocol seems quite stupid. We will soon switch to ZODB 3.2 and when cache validation still needs minutes, I will need to look into this...
Apparently, the ZEO in 2.7 and the HEAD is better about doing as little work as possible for cache verification at startup than older versions were. - C
On Sun, 2003-12-14 at 16:53, Chris McDonough wrote:
We still use ZODB 3.1 and at least there, the cache verification protocol seems quite stupid. We will soon switch to ZODB 3.2 and when cache validation still needs minutes, I will need to look into this...
Apparently, the ZEO in 2.7 and the HEAD is better about doing as little work as possible for cache verification at startup than older versions were.
Specifically, the server saves the list of invalidated objects for the last N transactions and the client keeps track of the last transaction id it received an invalidation for. One restart, if the last txn the client saw was within N of the current transaction, the server just sends the list of invalidated objects. For a large cache, this is much cheaper than having the client send the oid and serialno of every object in the cache to the server. N is configurable. I think the default value is 100. Jeremy
participants (8)
-
Chris McDonough -
Dieter Maurer -
Jeremy Hylton -
Stuart Bishop -
Stuart Bishop -
Tim Peters -
Tres Seaver -
Yuppie