[ZODB-Dev] Support for graceful ZODB Class renaming
Jim Fulton
jim@zope.com
Thu, 16 Jan 2003 15:14:25 -0500
Problem
A long-standing problem in ZODB is that renaming/moving classes
or modules is painful, because module and class names are scattered
throughout databases.
For example, consider a class named C, stored in module x.y.z.
The database records for persistent instances of the class contain
a pickle of a tuple containing the module and class names. Similarly,
pickles of containing objects have pickles of the same tuples.
If the instances of the class are non-persistent, then the database
contains "global" pickles for the classes wherever there is an instance
pickle.
If we wish to either rename the class or the module, or move the
class to a different module, then we have a problem, because we have
have lots of pickles with the old name that will be unloadable if
the old names become invalid.
Define the "dotted name" of a class to be some combination of the
module and class name. We have a problem when the dotted name of a
class changes. (This problem extends to other global objects, but
classes provide the most common and compelling source of the problem.)
Because ZODB 4 is still in an early stage of development, this seems
like an opportune time to consider solutions to this problem.
Possible solutions
1. The classic solution to this problem was to create aliases for the
old names.
For example, suppose we renamed x.y, to x.q. We'd also modify x.q's
__init__.py to create an alias in sys.modules::
sys.modules['x.y'] = sys.modules['x.q']
We'd create a similar alias in x.q.z:
sys.modules['x.y.z'] = sys.modules['x.q.z']
This is a bit of a bother.
This could be cleaned up a bit if there was an alias table that
one could create (probably with an include mechanism) to collect these
operations together.
A bother with this approach is that the aliases need to be maintained
as long as the old pickles exist in the database, which could be
indefinitely.
A real problem with this approach is that we could end up
unpickling objects with the wrong class if the old names get
reused by new classes. For example, suppose that, after renaming
x.y, we create a new x.y with a z containing a C. This new C
class would be instantiated for pickles that should really get
the x.q.z.C class. This requires enough bad luck, however, that
we haven't been bitten by it yet AFAIK,
2. Another approach would be to write a data conversion utility for the
database. This would require a conversion file much like the alias file
described above.
You might have to shut down the database while you do the
conversion, resulting in down time, however, if you combined the
aliasing approach with conversion, you could avoid the down time.
Suppose, for example, that you had an alias table mapping old to
new dotted object names. We can use the database without
modifications if we provide a "global" loader that uses this alias
file (or if we have a utility that manipulates sys.modules on
start up).
We can write a utility for file storage, similar to a
pack, that makes a live copy of the storage file, containing
converted records and that switches to the new file when the
conversion is complete. For many other storages, we could perform
the fix ups in-place, which is even more attractive.
3. A more sophisticated approach is to build a table, stored in the
database providing a two-way mapping between a unique id and a
class module and name. The ids could be assigned automatically.
When pickling a class, we'd pickle the id, rather than the module
and class name. When unpickling a class, we'd lookup the module
and class name in the table.
As with option 2, an explicit operation is needed to change dotted
class names. As with option 2, aliases could be used to minimize
down time. Unlike option 2, the update operation could be really
fast, because we only need to update a single table.
A secondary benefit of this approach is that pickle sizes can be
reduced substantially, because class ids, rather than dotted names
are stored.
A downside of this approach is that misshapes in managing the id
table would be quite serious. For example, if a database record
containing the class ids is lost due to database corruption, large
portions of the database would become unusable. There are
various ways that this risk could be mitigated. For example, we
could keep assigned ids in a redundant file, possibly using a
simple log file.
Another disadvantage of this approach is that the ZODB software,
including storage implementations, has to be more sophisticated
to deal with the id to global mapping.
4. A variation on approach 3 is to have class authors explicitly
assign globally unique IDs (GUIDs) to classes. These GUIDS would
be used rather than randomly selected ids. This is a fairly
significant burden to place on class developers. GUIDs also
require more space that locally assigned ids.
An advantage of GUIDs is that GUIDs can be recovered from class
source files, so that there is a built-in redundancy in the
management of ids.
It's possible that GUIDs could be an optional feature of approach
3.
I'm inclined to go with option 2 because it is:
- Overall, it is simpler, although the conversion aspect is more
complicated.
- It has no risk of lost id information.
Thoughts?
Jim
--
Jim Fulton mailto:jim@zope.com Python Powered!
CTO (888) 344-4332 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org