Changing and migrating persistence structure
Hi, [I posted this to zodb-dev, but it seems that list isn't working at the moment(?) so I thought I'd try here too] I have a package (plone.registry) that currently has a persistent structure like this: Registry(Persistent) | +--> Records(Persistent) | +--> BTree of Record(Persistent) | +--> PersistentField(Persistent) That is, a Registry is a persistent object containing a persistent Records object that in turn contains a BTree of persistent Record objects that contain a persistent PersistentField and a primitive value. This is quite inefficient, though, because it results in a lot of object loads. On any given request, some of our projects load a dozen or more values from the registry. Each is just a simple primitive, but we need to load the full shebang to make it work. Now, I'd like to move to this structure: Registry(Persistent) | +--> Records | +--> BTree of Field | +--> BTree of values Here, there's only one Persistent object, plus the two BTrees: one holding all the fields and one holding all the values. Records no longer needs to be persistent (its attributes become part of the parent Registry's _p_jar). Fields no longer need to be persistent either, since they are in effect immutable objects. Values are primitives anyway. I've done this (in a branch) and it works for new sites. However, I'm having a nightmare trying to migrate old sites. As soon as I access anything that uses the registry, I get ZODB errors, because the persistent structure is now different. In particular, it's trying to read a value into e.g. a Records object that used to derive from Persistent, but now no longer does. What is the best way to manage this type of migration? In terms of API compatibility, I'd really like to keep plone.registry.Record as the name and module path of the record, since it is used by the API. The difference is that before it was persisted and returned by an API on the Registry. Now, it's constructed as needed on the fly from the internal data structure. The same applies to the various field types that derive from PersistentField, which are now Persistent, but won't be. There's code and documentation out there that use these. I'm less worried about the Records object, which was always an implementation detail, and the BTree-of-records, which will never have been accessed directly. Cheers, Martin
On Thu, Aug 5, 2010 at 2:36 AM, Martin Aspeli <optilude+lists@gmail.com> wrote: ...
I have a package (plone.registry) that currently has a persistent structure like this:
Registry(Persistent) | +--> Records(Persistent) | +--> BTree of Record(Persistent) | +--> PersistentField(Persistent)
That is, a Registry is a persistent object containing a persistent Records object that in turn contains a BTree of persistent Record
Since BTrees are mapping, I assume that you mean the values are records and that the keys are something boring like strings or integers. I like to use mathematical notation when talking about BTrees and sets, as in: Registry BTree {? -> Record}
objects that contain a persistent PersistentField and a primitive value.
This is quite inefficient, though, because it results in a lot of object loads. On any given request, some of our projects load a dozen or more values from the registry. Each is just a simple primitive, but we need to load the full shebang to make it work.
Not sure what you mean by "full shebang".
Now, I'd like to move to this structure:
Registry(Persistent) | +--> Records | +--> BTree of Field | +--> BTree of values
I'm foggy on what "field" and "value" are here or what your queries are doing. Maybe this is just a distraction.
Here, there's only one Persistent object, plus the two BTrees: one holding all the fields and one holding all the values. Records no longer needs to be persistent (its attributes become part of the parent Registry's _p_jar).
I wonder what role "Records" plays independent of the "Registry". I also wonder why it matters whether it is persistent or not.
Fields no longer need to be persistent either, since they are in effect immutable objects. Values are primitives anyway.
I've done this (in a branch) and it works for new sites. However, I'm having a nightmare trying to migrate old sites. As soon as I access anything that uses the registry, I get ZODB errors, because the persistent structure is now different. In particular, it's trying to read a value into e.g. a Records object that used to derive from Persistent, but now no longer does.
What savings do you get by making Records non-persistent?
What is the best way to manage this type of migration?
Today, it probably makes the most sense to make new classes for the non-persistemnt objects. You'll then need to write a script to rebuild the data structures. Jim -- Jim Fulton
Hi Jim, On 08/08/2010, Jim Fulton <jim@zope.com> wrote:
On Thu, Aug 5, 2010 at 2:36 AM, Martin Aspeli <optilude+lists@gmail.com> wrote: ...
I have a package (plone.registry) that currently has a persistent structure like this:
Registry(Persistent) | +--> Records(Persistent) | +--> BTree of Record(Persistent) | +--> PersistentField(Persistent)
That is, a Registry is a persistent object containing a persistent Records object that in turn contains a BTree of persistent Record
Since BTrees are mapping, I assume that you mean the values are records and that the keys are something boring like strings or integers.
Yes. The keys are strings.
I like to use mathematical notation when talking about BTrees and sets, as in:
Registry BTree {? -> Record}
objects that contain a persistent PersistentField and a primitive value.
This is quite inefficient, though, because it results in a lot of object loads. On any given request, some of our projects load a dozen or more values from the registry. Each is just a simple primitive, but we need to load the full shebang to make it work.
Not sure what you mean by "full shebang".
The Registry, Records object, the relevant Record in the relevant BTree, and possibly the PersistentField object. In the "new" branch it just looks up the value in the relevant BTree.
Now, I'd like to move to this structure:
Registry(Persistent) | +--> Records | +--> BTree of Field | +--> BTree of values
I'm foggy on what "field" and "value" are here or what your queries are doing. Maybe this is just a distraction.
Somewhat, unless you've worked with plone.registry. The point is to allow the "get a value" API to just look at self.values[key], which is a fast lookup and doesn't load anything except the relevant BTree bucket + the registry itself.
Here, there's only one Persistent object, plus the two BTrees: one holding all the fields and one holding all the values. Records no longer needs to be persistent (its attributes become part of the parent Registry's _p_jar).
I wonder what role "Records" plays independent of the "Registry".
None, really. The main reason to have it is to be able to have an API like registry.records with dict-like notation (there's also __getitem__ on the registry, which returns the value of a given key, not the Record). I made ``records`` an attribute of type Records, and Records derives from Persistent. I wish I hadn't, since it can just live in its parent's _p_jar.
I also wonder why it matters whether it is persistent or not.
It's better if it isn't (one fewer object to load/fill up the cache), though the real culprits are the many Record objects each being persistent and loaded separately. On a given request, we can end up loading a dozen or more values from the registry, which means a dozen or more objects in the cache and associated overhead.
Fields no longer need to be persistent either, since they are in effect immutable objects. Values are primitives anyway.
I've done this (in a branch) and it works for new sites. However, I'm having a nightmare trying to migrate old sites. As soon as I access anything that uses the registry, I get ZODB errors, because the persistent structure is now different. In particular, it's trying to read a value into e.g. a Records object that used to derive from Persistent, but now no longer does.
What savings do you get by making Records non-persistent?
One fewer persistent object. I think the real saving is in making the Record object non-persistent, especially since the "read" use case can just read from the ``values`` BTree with the structure above.
What is the best way to manage this type of migration?
Today, it probably makes the most sense to make new classes for the non-persistemnt objects. You'll then need to write a script to rebuild the data structures.
Okay. So there's no way to get at the data if I take Persistent out of the base classes for Records / Record. Martin
On Sun, Aug 8, 2010 at 2:21 PM, Martin Aspeli <optilude+lists@gmail.com> wrote:
On 08/08/2010, Jim Fulton <jim@zope.com> wrote:
On Thu, Aug 5, 2010 at 2:36 AM, Martin Aspeli <optilude+lists@gmail.com> wrote:
What is the best way to manage this type of migration?
Today, it probably makes the most sense to make new classes for the non-persistemnt objects. You'll then need to write a script to rebuild the data structures.
Okay. So there's no way to get at the data if I take Persistent out of the base classes for Records / Record.
There should be some way of doing this with custom __getstate__ and __setstate__ methods. It's just tricky to get right and a bit fragile. It's much easier to write the migration code if both the old and new class are separate and functioning at the same time. Hanno
On 8 August 2010 20:29, Hanno Schlichting <hanno@hannosch.eu> wrote:
There should be some way of doing this with custom __getstate__ and __setstate__ methods.
It's just tricky to get right and a bit fragile. It's much easier to write the migration code if both the old and new class are separate and functioning at the same time.
The main problem is that the advertised API says you should do: from plone.registry import Record from plone.registry import field registry['foo.bar'] = Record(field.TextLine(), u"my value") Here, field.TextLine derives from PersistentField which derives from Persistent, and Record derives from Persistent also. If I wanted to get rid of the Persistent base, I'd have to make a new "tree" of field types (the standard zope.schema ones still need some subclassing), and a new Record class with a less obvious name. Martin
On Mon, Aug 09, 2010 at 09:03:18AM +0800, Martin Aspeli wrote:
On 8 August 2010 20:29, Hanno Schlichting <hanno@hannosch.eu> wrote:
There should be some way of doing this with custom __getstate__ and __setstate__ methods.
It's just tricky to get right and a bit fragile. It's much easier to write the migration code if both the old and new class are separate and functioning at the same time.
The main problem is that the advertised API says you should do:
from plone.registry import Record from plone.registry import field
registry['foo.bar'] = Record(field.TextLine(), u"my value")
Here, field.TextLine derives from PersistentField which derives from Persistent, and Record derives from Persistent also.
If I wanted to get rid of the Persistent base, I'd have to make a new "tree" of field types (the standard zope.schema ones still need some subclassing), and a new Record class with a less obvious name.
You could create the new non-persistent classes with less obvious names in registry.__setitem__ if you get old-style Persistent values passed in. Marius Gedminas -- http://pov.lt/ -- Zope 3/BlueBream consulting and development
participants (4)
-
Hanno Schlichting -
Jim Fulton -
Marius Gedminas -
Martin Aspeli