John Barratt wrote:
If you can't use catalog metadata as Seb suggests (eg. you are actually accessing many attributes, large values, etc.) and if indeeed memory is the problem (which seems likely) then you can ghostify the objects that were ghosts to begin with, and it will save memory (unless all those objects are already in cache).
The problem with this strategy though is that doc.getObject() method used in your code activates the object and hence you won't know if it was a ghost already or not. To get around this you can shortcut this method and do something like :
docs = container.portal_catalog(meta_type='Document', ...) for doc in docs: obj = doc.aq_parent.unrestrictedTraverse(doc.getPath()) was_ghost = obj._p_changed is None value = obj.attr if was_ghost:obj._p_deactivate()
Just my 2 cents observation... I ran this code and monitored the page "Cache extreme detail" in ZMI > ControlPanel > DebugInfo. With this method, the object was not loaded. However the intermediate objects that the unrestrictedTraverse() passed by were loaded into memory. e.g. If doc.getPath() is '/x/y/z/myobject', myobject was not loaded but x, y, and z were loaded into memory. I also tried the method suggested by Seb. This did not load myobject as well as x, y, and z into memory: http://mail.zope.org/pipermail/zope-dev/2003-September/020450.html The information on deactivating object into ghost state is very helpful. Thanks! cheers, zhi min
zhimin@iss.nus.edu.sg wrote:
Just my 2 cents observation... I ran this code and monitored the page "Cache extreme detail" in ZMI > ControlPanel > DebugInfo. With this method, the object was not loaded. However the intermediate objects that the unrestrictedTraverse() passed by were loaded into memory. e.g. If doc.getPath() is '/x/y/z/myobject', myobject was not loaded but x, y, and z were loaded into memory. This is a good point, and for catalog based retrievals of objects may be difficult (but not impossible) to avoid excess objects remaining in the cache. If however you are doing a walk of part of your ZODB tree (unlike 'randomly' accessing objects from the ZODB like this) you could ensure that you do a depth first traversal, then as you come back up the tree, traversed nodes that weren't active before the walk would be deactivated.
eg something like this external method : def traverseTree(self): ''' Traverse the tree and do something. ''' was_ghost = self._p_changed is None for ob in self.objectValues(): traverseTree(ob) # XXX Do something with self here : self.doSomething() if was_ghost:self._p_deactivate() This should ensure that any 'traversed over' nodes that were previously not active are de-activated. JB.
def traverseTree(self): ''' Traverse the tree and do something. '''
was_ghost = self._p_changed is None
for ob in self.objectValues(): traverseTree(ob)
# XXX Do something with self here : self.doSomething()
if was_ghost:self._p_deactivate()
Hmmm, does _p_deactivate() clear the contents of the object's _v_ variables? cheers, Chris
Toby Dickenson wrote:
On Thursday 25 September 2003 11:51, Chris Withers wrote:
Hmmm, does _p_deactivate() clear the contents of the object's _v_ variables?
Yes
Then given your earlier comment that _v_ variables are supposed to last at least teh length of the request, John's idea of using _p_deactivate() to reduce memory usage for large ZCatalog results sets could be seen as playing with fire, right? Chris
On Friday 26 September 2003 09:32, Chris Withers wrote:
Toby Dickenson wrote:
On Thursday 25 September 2003 11:51, Chris Withers wrote:
Hmmm, does _p_deactivate() clear the contents of the object's _v_ variables?
Yes
Then given your earlier comment that _v_ variables are supposed to last at least teh length of the request, John's idea of using _p_deactivate() to reduce memory usage for large ZCatalog results sets could be seen as playing with fire, right?
He had code that would only deactivate objects at the end of a loop if they were not active at the start of the loop. Its not entirely safe, but I think it mitigates the most serious risks here. ZCatalog (and others) have used this same technique for a while. -- Toby Dickenson
On Friday 26 September 2003 04:37 am, Toby Dickenson wrote:
On Friday 26 September 2003 09:32, Chris Withers wrote:
Toby Dickenson wrote:
On Thursday 25 September 2003 11:51, Chris Withers wrote:
Hmmm, does _p_deactivate() clear the contents of the object's _v_ variables?
Yes
Then given your earlier comment that _v_ variables are supposed to last at least teh length of the request, John's idea of using _p_deactivate() to reduce memory usage for large ZCatalog results sets could be seen as playing with fire, right?
He had code that would only deactivate objects at the end of a loop if they were not active at the start of the loop. Its not entirely safe, but I think it mitigates the most serious risks here.
ZCatalog (and others) have used this same technique for a while.
It should be entirely safe so long as the application does not store non-redundant data in _v_ variables. If the application does do this, then it needs to register the object as changed in the transaction and override __getstate__ so that the value of the _v_ variables are stored. I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries. _p_deactivate is only effective if the object is in the "up-to-date" state. So if by some chance the object was changed after being loaded from the catalog results set, _p_deactivate() would do no harm. -Casey
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary? Chris
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary?
Note that we had a problem related to this with a client recently: In CMF, skin data is stored in portal._v_skindata, and is actually needed for the whole request, but in some ZEO setting this go cleared by a get_transaction().commit(1) which was unexpected and led to breakage because in that batch treatment we used some skin methods too... We ended up calling portal.setupCurrentSkin() after the commit() to reset the skins. FYI. Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
On Thursday 09 October 2003 14:01, Florent Guillaume wrote:
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary?
Note that we had a problem related to this with a client recently: In CMF, skin data is stored in portal._v_skindata, and is actually needed for the whole request, but in some ZEO setting this go cleared by a get_transaction().commit(1) which was unexpected and led to breakage because in that batch treatment we used some skin methods too...
Something after the subtransaction commit must be tickling the cache garbage collector. Thats generally what subtransactions are used for. A while ago there was a discussion on zodb-dev about _v_-like attributes that would be automatically cleared at the end of a transaction. Do we need something similar that guarantees it will _not_ be cleared until the end of the transaction? -- Toby Dickenson
Toby Dickenson wrote:
On Thursday 09 October 2003 14:01, Florent Guillaume wrote:
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary?
Note that we had a problem related to this with a client recently: In CMF, skin data is stored in portal._v_skindata, and is actually needed for the whole request, but in some ZEO setting this go cleared by a get_transaction().commit(1) which was unexpected and led to breakage because in that batch treatment we used some skin methods too...
Something after the subtransaction commit must be tickling the cache garbage collector. Thats generally what subtransactions are used for.
A while ago there was a discussion on zodb-dev about _v_-like attributes that would be automatically cleared at the end of a transaction. Do we need something similar that guarantees it will _not_ be cleared until the end of the transaction?
IMO YAGNI. I think the application should tolerate the disappearance of _v_ vars. I would consider the problem with CMF skins a bug that needs to be fixed. AFAIK there is nothing stored in _v_skindata that cannot be reconstructed from data in the skins tool. Has an issue been filed in the CMF collector regarding this? -Casey
Casey Duncan wrote at 2003-10-10 09:26 -0400:
... IMO YAGNI. I think the application should tolerate the disappearance of _v_ vars. I would consider the problem with CMF skins a bug that needs to be fixed. AFAIK there is nothing stored in _v_skindata that cannot be reconstructed from data in the skins tool.
Please think about database connections which currently are maintained in "_v_" attributes. Reopening a connection in mid request can be desasterous as what should be one transaction becomes two separate transactions (although they are both committed or both aborted). Dieter
IMO YAGNI. I think the application should tolerate the disappearance of _v_ vars. I would consider the problem with CMF skins a bug that needs to be fixed. AFAIK there is nothing stored in _v_skindata that cannot be reconstructed from data in the skins tool.
Has an issue been filed in the CMF collector regarding this?
Done, http://collector.zope.org/CMF/198 Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
Toby Dickenson wrote at 2003-10-10 07:54 +0100:
... A while ago there was a discussion on zodb-dev about _v_-like attributes that would be automatically cleared at the end of a transaction. Do we need something similar that guarantees it will _not_ be cleared until the end of the transaction?
This definitely is necessary for the "_v_" attributes that hold connections to a relational database. If such a "_v_" attribute is flushed, the next access to the DA (in the same request) reopens the database. As this is a new connection, it does not see the changes made by the previous connection (in the same request). This can lead to very nasty non-deterministic and almost ununderstandable errors. Dieter
On Fri, 2003-10-10 at 14:34, Dieter Maurer wrote:
If such a "_v_" attribute is flushed, the next access to the DA (in the same request) reopens the database. As this is a new connection, it does not see the changes made by the previous connection (in the same request).
This can lead to very nasty non-deterministic and almost ununderstandable errors.
Such as prematurely triggered integrity contraints that would be satisfied by another operation before the end of the transaction. Cheers, Leo -- Ideas don't stay in some minds very long because they don't like solitary confinement.
On Friday 10 October 2003 18:34, Dieter Maurer wrote:
Toby Dickenson wrote at 2003-10-10 07:54 +0100:
... A while ago there was a discussion on zodb-dev about _v_-like attributes that would be automatically cleared at the end of a transaction. Do we need something similar that guarantees it will _not_ be cleared until the end of the transaction?
This definitely is necessary for the "_v_" attributes that hold connections to a relational database.
If such a "_v_" attribute is flushed, the next access to the DA (in the same request) reopens the database. As this is a new connection, it does not see the changes made by the previous connection (in the same request).
Thats how alot of code works today, and how it had to be done in the past. Today we have the 'transaction participant' interface. That would be a better place to hold these things, allowing the DA object itself to be deactivated if necessary. -- Toby Dickenson
Toby Dickenson wrote:
Today we have the 'transaction participant' interface. That would be a better place to hold these things, allowing the DA object itself to be deactivated if necessary.
What's the 'transaction participant interface' and where can I find otu mroe about it? Chris
Chris Withers wrote at 2003-10-8 21:22 +0100:
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary?
This will invalidate many current uses: * use for database connections * use for skin data * ... Dieter
Dieter Maurer wrote:
Chris Withers wrote at 2003-10-8 21:22 +0100:
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary?
This will invalidate many current uses:
* use for database connections
Not really, I would expect a DA to just re-connect if it got garbage collected...
* use for skin data
This seems to be considered a bug...
* ...
How do we go about finding these? ;-) Chris
Chris Withers wrote at 2003-10-15 12:49 +0100:
Dieter Maurer wrote:
Chris Withers wrote at 2003-10-8 21:22 +0100:
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
I agree with this. How do we go about find code that uses the assumption that _v_ stuff won't change unless it's at a transaction boundary?
This will invalidate many current uses:
* use for database connections
Not really, I would expect a DA to just re-connect if it got garbage collected...
Did you think about it? It means that what should be one transaction becomes two. If it were a single transaction, the second part would be able to see the effects of the first part. This is not the case with two distinct transactions. Analysing such behaviour is a nightmare... Dieter
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case? Chris
On Wednesday 15 October 2003 12:47, Chris Withers wrote:
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
every database adapter? (I guess, but havent checked) -- Toby Dickenson
Chris Withers wrote:
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
I'm a bit puzzled - of what use is a variable which may disappear "at any random time"? seb
Seb Bacon wrote:
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
I'm a bit puzzled - of what use is a variable which may disappear "at any random time"?
For caching things... Chris
ChrisW wrote:
Seb Bacon wrote:
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
I'm a bit puzzled - of what use is a variable which may disappear "at any random time"?
For caching things...
Note that caching things in _v_ attributes can be a complex business. In many cases, there is some situation where the cache has to be invalidated, be it at REQUEST boundary or at a different time. And this can be difficult to do correctly, see the hoops I had to go through in CMFCore.MemberDataTool regarding _v_temps and the need for it to be cleaned at the end of the request (using REQUEST._hold). I'm sure this is needed for lots of cases and isn't actually implemented. I know at one point LDAPUserFolder had such caching of its entries, I haven't looked at it in a while. Florent -- Florent Guillaume, Nuxeo (Paris, France) +33 1 40 33 79 87 http://nuxeo.com mailto:fg@nuxeo.com
Seb Bacon wrote:
Chris Withers wrote:
Casey Duncan wrote:
I would argue that a better plan would be to only use _v_ vars for completely disposable data only. The application should expect that this values will be gone at any random time, not just at transaction boundaries.
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
I'm a bit puzzled - of what use is a variable which may disappear "at any random time"?
It's not exactly random. It would happen when the object was deactivated (removed from cache). If the object is marked as changed (ala _p_changed=1) then its __getstate__ will be called before it is deactivated. If it hasn't changed though then it doesn't really get a chance to do anything about it. Deactivation only happens AFAIK at transaction or subtransaction boundaries. This gives at least some predictability, since subtransactions are rarely used. Perhaps this is why database adapters have been historically incompatible with subtransactions? It seems to me that DAs are a bit broken with regard to where they store their database connection objects. They should register an object with the transaction that holds the connection so that it can be properly committed or aborted regardless of what happens with _v_ variables in the interim. -Casey
On Wednesday 15 October 2003 14:53, Casey Duncan wrote:
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
I'm a bit puzzled - of what use is a variable which may disappear "at any random time"?
It's not exactly random. It would happen when the object was deactivated (removed from cache).
The proposal earlier in the thread was aiming towards allowing objects to get deactivated at any time if the cache was overfull, not just at transaction boundaries. This is desirable from a cache management point of view. Apart from the most trivial cases, it would allow _v_ attributes to disappear at random. Its a similar problem to the one that makes it hard to write an optimiser for python code, and I am unconvinced that this is sane. -- Toby Dickenson
Toby Dickenson wrote:
On Wednesday 15 October 2003 14:53, Casey Duncan wrote:
Agreed. Are there any situations, apart from the already discussed CMF skindata, where this currently isn't the case?
I'm a bit puzzled - of what use is a variable which may disappear "at any random time"?
It's not exactly random. It would happen when the object was deactivated (removed from cache).
The proposal earlier in the thread was aiming towards allowing objects to get deactivated at any time if the cache was overfull, not just at transaction boundaries. This is desirable from a cache management point of view.
Apart from the most trivial cases, it would allow _v_ attributes to disappear at random. Its a similar problem to the one that makes it hard to write an optimiser for python code, and I am unconvinced that this is sane.
I agree. If objects disappeared from cache randomly, I think the system as a whole would not be stable or predictable. I also think it would tend to make a loaded server even more loaded by thrashing the cache unnecessarily. As it is, the hard cache implementation, although beneficial from a memory management perspective cause loaded servers to do alot more work because they are constantly pruning the cache and then reloading objects again immediately thereafter. It might be worth considering a more gradual cache mgmt policy which has a target size, a maximum size and a prune rate. Currently, we have only a maximum size. Then again, since Python never really returns memory to the OS, I'm not sure it matters much in the end. -Casey
On Wednesday 15 October 2003 15:51, Casey Duncan wrote:
As it is, the hard cache implementation, although beneficial from a memory management perspective cause loaded servers to do alot more work because they are constantly pruning the cache and then reloading objects again immediately thereafter.
If you are seening that then I think you need a bigger cache. And possibly fewer publisher threads. -- Toby Dickenson
Toby Dickenson wrote:
Apart from the most trivial cases, it would allow _v_ attributes to disappear at random. Its a similar problem to the one that makes it hard to write an optimiser for python code, and I am unconvinced that this is sane.
Which, unfortunately, then leaves us with the problem of how to stop Zope using up an undeterminable amount of memory... Chris
On Thursday 23 October 2003 08:07, Chris Withers wrote:
Toby Dickenson wrote:
Apart from the most trivial cases, it would allow _v_ attributes to disappear at random. Its a similar problem to the one that makes it hard to write an optimiser for python code, and I am unconvinced that this is sane.
Which, unfortunately, then leaves us with the problem of how to stop Zope using up an undeterminable amount of memory...
No, we just exclude objects with _v_ attribute from mid-transaction deactivation. There arent many objects in that category, but they do need protection. But, your proposal means we would improve the situation for transactions that read from an undeterminable number of persistent objects. It does not help for transactions that touch an undeterminable number of non-persistent objects, or transactions that change an undeterminable number of persistent objects. Is the gain big enough to justify the effort? -- Toby Dickenson
Toby Dickenson wrote:
No, we just exclude objects with _v_ attribute from mid-transaction deactivation. There arent many objects in that category, but they do need protection.
This is slightly OT but reminded me of something important I need to ask. ZOracleDA stores its database connections in a _v_ variable on the DA object. It tries to delete this by setting the _v_ variable to None. However, a number of people have noticed that the Oracle connections aren't going away from the Oracle server's point of view. In most cases, however, they do go away when the ZODB cache is cleared. What in the ZODB cache or other ZODB code could be causing _v_ variables to stick around after they've been set to None in their containing objects? Chris
On Thursday 23 October 2003 18:52, Chris Withers wrote:
What in the ZODB cache or other ZODB code could be causing _v_ variables to stick around after they've been set to None in their containing objects?
reference cycles -- Toby Dickenson
Chris Withers wrote at 2003-10-23 18:52 +0100:
... This is slightly OT but reminded me of something important I need to ask.
ZOracleDA stores its database connections in a _v_ variable on the DA object. It tries to delete this by setting the _v_ variable to None.
However, a number of people have noticed that the Oracle connections aren't going away from the Oracle server's point of view.
In most cases, however, they do go away when the ZODB cache is cleared.
This suggests that something holds references to the connection objects. I have no idea what this could be. Some years ago, when I was forged to work with Oracle (it was a bad time), I had to fix a circular reference (in "DCOracle1" at that time) with stored proceedure handling. However, these cyclic structures would not have disappeared by flushing the ZODB cache. This means, you must see something else. I never worked with "DCOracle2".
What in the ZODB cache or other ZODB code could be causing _v_ variables to stick around after they've been set to None in their containing objects?
Nothing. When you assign "None" to the "_v_variable", then this reference will go away. There may be others, which you did not set to "None"... -- Dieter
Dieter Maurer wrote:
What in the ZODB cache or other ZODB code could be causing _v_ variables to
stick around after they've been set to None in their containing objects?
Nothing.
When you assign "None" to the "_v_variable", then this reference will go away. There may be others, which you did not set to "None"...
Well in that case, why would flushing the ZODB cache cause the errant connections to be closed? Chris
Chris Withers wrote at 2003-10-29 22:03 +0000:
Dieter Maurer wrote:
What in the ZODB cache or other ZODB code could be causing _v_ variables to
stick around after they've been set to None in their containing objects?
Nothing.
When you assign "None" to the "_v_variable", then this reference will go away. There may be others, which you did not set to "None"...
Well in that case, why would flushing the ZODB cache cause the errant connections to be closed?
I guess because there were other references not set to "None"... -- Dieter
Toby Dickenson wrote:
Which, unfortunately, then leaves us with the problem of how to stop Zope using up an undeterminable amount of memory...
No, we just exclude objects with _v_ attribute from mid-transaction deactivation. There arent many objects in that category, but they do need protection.
Indeed, I guess they're unlikely to be the ones that cause Zope's memory usage to baloon...
But, your proposal means we would improve the situation for transactions that read from an undeterminable number of persistent objects.
Yep.
It does not help for transactions that touch an undeterminable number of non-persistent objects,
Under what circumstances is this likely to happen?
or transactions that change an undeterminable number of persistent objects. Is the gain big enough to justify the effort?
Well, hmmm, that's tricky. I guess that's the point where the fact that Zope so neatly hides the fact that it's interacting with a concurrent transactional database becomes a PITA. I've written code in the past that just does a get_transaction().commit() half way through a request. I don't remember any problems, but how hot exactly is the fire I'm playing with? cheers, Chris
Casey Duncan wrote at 2003-10-15 09:53 -0400:
... It seems to me that DAs are a bit broken with regard to where they store their database connection objects. They should register an object with the transaction that holds the connection so that it can be properly committed or aborted regardless of what happens with _v_ variables in the interim.
That is not the problem. The problem is to *find* the connection object previously opened in the same request when you access it a second time. It *must* be ensured that this is the *same* connection object as otherwise a single Zope transaction affects two different connections for the same database (which translates into two different transactions). Dieter
participants (10)
-
Casey Duncan -
Chris Withers -
Chris Withers -
Dieter Maurer -
Florent Guillaume -
John Barratt -
Leonardo Rochael Almeida -
Seb Bacon -
Toby Dickenson -
zhimin@iss.nus.edu.sg