ZODB: interface for walking object graph?
I'd like a way to find all instances of a given class in a ZODB. (In ODBMS jargon, this feature seems to be called "class extents".) The purpose is to handle changes to a class that require rearranging; you can do it lazily in a __setstate__ method, but after you've had 10 changes, that __setstate__ will be really complicated. And you can never simplify it, because you never know if there's a version 1 object hiding somewhere that just hasn't been accessed yet. Instead, I'd like to do it eagerly; write a script that loops over all the instances of class C, make a change to all of them, and then commit. I only see one way of doing this, by walking over the whole object graph starting from the ZODB's root objects. Storage objects do this in their pack() method to determine reachability. Problem: you'd have to copy the walking logic out of the pack() method into your own method. Proposal: expose some interface allowing clients of a Connection to run a function over every single object in a ZODB. It'd be like os.path.walk(), except over objects, not directories and files. This means unpickling *every* *single* object in the ZODB (and sending them all over the wire if you're using ZEO). Inefficient, but there seems no other alternative, and you wouldn't do this very often. Jim, what's your reaction? -- A.M. Kuchling http://starship.python.net/crew/amk/ Somehow the people who do as they please seem to get along just about as well as those who are always trying to please others. -- Bob Edwards
On Thu, 23 Mar 2000, Andrew M. Kuchling wrote:
I'd like a way to find all instances of a given class in a ZODB. (In
You probably already know this, but if your classes are CatalogAware, and you index on meta_type, then you can use the Catalog and its getPath method to walk the list of class instances. True, this isn't a general solution, but it will work in a lot of cases. --RDM
On Thu, 23 Mar 2000, Andrew M. Kuchling wrote:
I only see one way of doing this, by walking over the whole object graph starting from the ZODB's root objects. Storage objects do this in their pack() method to determine reachability. Problem: you'd have to copy the walking logic out of the pack() method into your own method.
Hmmm ... wouldn't that imply that you would need to change the stored instances in place? Lets suppose there is such a function. What should it happen when you do find such an 'unreachable' object. Modify it and append it to the end of Data.fs? As part of which transaction? (I suppose you can have one transaction that modifies all the objects of the given type) What happens if that unreachable instance has a newer version? Then if you modify both versions as part of different transactions (or even the same) you still loose the Undo facility. Generally I can't see an easy way of changing single instances that belong to a multi-instance transaction and keeping that transaction set as a group. Seems to me you would have the same problem even if you modify just the reachable objcets. If I am right then you will loose the Undo facility anyways so you might as well pack to 0 and modify all the reachable ones in one transaction. Pavlos
Pavlos Christoforou writes:
On Thu, 23 Mar 2000, Andrew M. Kuchling wrote: instances in place? Lets suppose there is such a function. What should it happen when you do find such an 'unreachable' object. Modify it
You can't find unreachable objects doing a reachability scan starting from the root, so this doesn't matter. Transactions are a tricky problem; if you're trying to modify every instance of C, but someone else is always modifying a C instance, you'll probably never be able to commit. You might have to modify instances in smaller batches and commit after each batch, then. -- A.M. Kuchling http://starship.python.net/crew/amk/ "Aww, c'mon! Where's your sense of fun?" "I'm the standard model, Zachary. 'Fun' was optional." -- Zot and Peabody, in ZOT! #1
On Thu, 23 Mar 2000, Andrew M. Kuchling wrote:
You can't find unreachable objects doing a reachability scan starting from the root, so this doesn't matter. Transactions are a tricky
I was under the impression you were interested in modifying all versions of instances even old ones. Even those can be found during a pack operation at least the pack operation in dbmStorage does find them. If only the 'current' versions are to be modified then things get a lot easier (a python script will do I suppose), unless someone is continously updating the relevant instanceas you've mentioned. Pavlos
On Thu, 23 Mar 2000 16:07:17 -0500 (EST), "Andrew M. Kuchling" <akuchlin@mems-exchange.org> wrote:
I'd like a way to find all instances of a given class in a ZODB. (In ODBMS jargon, this feature seems to be called "class extents".) The purpose is to handle changes to a class that require rearranging; you can do it lazily in a __setstate__ method, but after you've had 10 changes, that __setstate__ will be really complicated.
This is an effect that concerned me when I first looked at ZODB, but I was pleasantly suprised at the lack of complicated __setstate__ methods in the Zope source. So far my products are free of this effect too. Andrew, are your experiences different? Toby Dickenson tdickenson@geminidataloggers.com
"Andrew M. Kuchling" wrote:
I'd like a way to find all instances of a given class in a ZODB. (In ODBMS jargon, this feature seems to be called "class extents".) The purpose is to handle changes to a class that require rearranging; you can do it lazily in a __setstate__ method, but after you've had 10 changes, that __setstate__ will be really complicated.
It's not too bad, although you are correct that allways get more complex and never less. We have generally been able to avoid incredibly complex setstates because we try to not evolve an object to incredibly, as opposed to engineering a new kind of object that abstracts out or delegates or whatever. Also, if you just want to add an attribute you can add a shared instance attribute to the class definition. We do this alot to avoid changing setstate just to add an attribute. This leaves much of our setstating to doing some kind of transformation.
And you can never simplify it, because you never know if there's a version 1 object hiding somewhere that just hasn't been accessed yet. Instead, I'd like to do it eagerly; write a script that loops over all the instances of class C, make a change to all of them, and then commit.
I think this could cause collisions and possibly aborts if your objects are being changed actively.
I only see one way of doing this, by walking over the whole object graph starting from the ZODB's root objects. Storage objects do this in their pack() method to determine reachability.
But they don't actually change anything, I don't think, thus no worries of collisions and no transaction commited.
Problem: you'd have to copy the walking logic out of the pack() method into your own method.
Proposal: expose some interface allowing clients of a Connection to run a function over every single object in a ZODB. It'd be like os.path.walk(), except over objects, not directories and files.
This means unpickling *every* *single* object in the ZODB (and sending them all over the wire if you're using ZEO).
(unless they're cached, but if you change them it gets worse, invalidation messages galore)
Inefficient, but there seems no other alternative, and you wouldn't do this very often.
Which makes me thing you would want to do something like this analgously to UNIX 'single user mode', ie, shut down your site, and run a single thread to execute this method of yours (immagine, for example, what would happen if two threads decided to run this method of yours at the same time). It's pretty drastic, in 9 out of 10 cases you could probably just get away with setstate. -Michel
Michel Pelletier writes:
It's not too bad, although you are correct that allways get more complex and never less. We have generally been able to avoid incredibly complex setstates because we try to not evolve an object to incredibly, as opposed to engineering a new kind of object that abstracts out or delegates or whatever.
I don't think we can guarantee that on our project. Right now we're looking at various object databases, most notably POET, Versant, and ZODB, to serve as the primary data store for our project. We're storing business objects, and therefore we can't predict how the objects will need to evolve in future. For example, we started out some months ago saying that .cost is just a numeric value or None. Now we've done some more requirements based on experience gained since then, and our cost requirements are now laid out in a 10-page document. Requiring that new classes be created to handle such changes means we'll end up with a *lot* of baggage to drag around,
Also, if you just want to add an attribute you can add a shared instance attribute to the class definition. We do this alot to avoid changing
Jim mentioned that in conversation; it doesn't work if the instance attribute is something mutable like a list that gets appended to, so it's not a fully general solution.
Which makes me thing you would want to do something like this analgously to UNIX 'single user mode', ie, shut down your site, and run a single thread to execute this method of yours (immagine, for example, what
Very probably this will be necessary. But my point is that there's no way to re-use the traversal logic that lives in pack(), so you have to re-impelement the traversal logic. -- A.M. Kuchling http://starship.python.net/crew/amk/ K is for KENGHIS KHAN. *He* was a very *nice* person. History has no record of him. There is a moral in that, somewhere. -- Harlan Ellison, "From A to Z in the Chocolate Alphabet"
participants (5)
-
Andrew M. Kuchling -
Michel Pelletier -
Pavlos Christoforou -
R. David Murray -
Toby Dickenson