Fred Wilson Horch wrote:
I hadn't thought of the issues you raise. Thanks for mentioning them.
These are issues that may very well affect everyone and I'm happy to share my thoughts.
I guess I would suggest that the serialized form of a Zope instance by default would be a single XML file, but that arbitrary sections of that XML file could be custom dumped to separate serialized files with similiar names. That way authors would have a pretty easy job of overriding sections of the dump process to spit out one or more simple files that have little parsing overhead.
Sounds reasonable.
2) A lesser problem is when trying to edit the serialized "files". Because objects are methods and state how you modify an object can be guided if not controlled. When we have serialized the objects in a Zope system to files, we have exported only the state of the objects in the ZODB. We then have to live with the ability to foul up invariant across many objects by changing some data in the serialized format. A good example would be ZCatalogs. [...]
Yup... it's probably easiest to make ZCatalogs a black box.
Black box doesn't solve this problem, only the first one. Imagine that I move a serialized version of a Zope object that is indexed by an instance of ZCatalog (or many for that matter). When I move it the ZCatalogs must be notified to handle the change, but only at import time because ZCatalogs are serialized as binary for lots of good reasons.
I see the problem. I think the example you give can be handled adequately at import time.
But I can see other examples where allowing edits to the serialized representation could create problems that would be impossible to resolve at import.
So it seems like we might want to make some things read only. That is, when you serialize the objects in the Zope ODB to a filesystem, some of those serialized files are read-only "black boxes". A comment in those files could let a developer know that to change the information in that file she needs to do an import, or edit the ODB directly.
I'm not sure that in the most general case this would solve the problem either. :-( How do we know when the value (or rather the change in value) of a property for some Zope object should trigger some method? It depends not only on the object itself, but possibly on many other objects. This is the general problem of separating an objects state from its methods. This is also equivalent to RDBMS triggers and referential integrity. A pretty good example of this would be a Zope Product that provided Lamps and Switches. Several lamps instances could be tied to a single switch instance. When the switch is on, the lamps need to be also. If I dump this to CVS then I can change the lamps and switches data separately. Should all the property values for a lamp be read-only? Even the description property? I understand that the kinds of objects you are working on this for don't have many of these problems, and that a very useful system could be built given the 80/20 rule. I'm bringing this up to make sure we know what the other 20 means.
When I import the object from the serialized format all I can know is that something changed, but without expensive processing (XML diffing is hard in the general case, we might be able to limit the structures to managable scope though) we can't know that the "foo" ZCatalog should be updated instead of the "bar" ZCatalog.
Seems like we will need to consider the import code very carefully.
I don't know enough about how ZCatalog works to discuss the options intelligently. But in other indexing systems I have worked with, there have been solutions for reindexing when making updates to the corpus.
As I understand it, the issue with ZCatalog is a good example because of the separation of concerns. A Catalog with indexes that contain Brains to get to the actual objects, a Controller that calls reindex/unindex, and the objects themselves that don't know they are cataloged. When I'm editing the property "x" of some object "Y" it can be very hard to know that it is indexed in some Catalog. Because it is hard to know I might have a difficult time deciding what should be read-only or when doing the import of "Y" that I need to call update on some other controller object to ensure that the indexes get updated.
a) XML is structured enough that it can reliably hold the data from the ZODB. The current XML dump is not useful for this - it would need to create individual files and folders to represent containment.
This is pretty easy right now. Ten lines of recursive code can walk the whole tree if necessary and export only leaf objects.
Great. Maybe I am closer than I realize to the CVS management solution. I need to look more closely at ZCVSmixin to see what it does. But for our immediate need (which is to allow a distributed team of developers to share code and track changes via a central CVS repository), maybe it makes the most sense just to segment the existing XML export into directories and files and enhance the existing import to allow overwriting objects.
b) A hybrid XML and custom dump solution. An Image for example could dump out as a binary image file with meta-data in a similiarly name XML file.
Yes, each object should make its own policy regarding its body. Its metadata format should be standardized, however.
I like this idea.
After I have the XML export/import working in a way that fits better with CVS (even if the sreialized representation is essentially a black box), then I can tackle how each object represents its body in a "morally plain text" serialized format.
I want to add here that it may be very useful to not specify that any object have only one serial format separate from the XML default. Specifically it might make it an easier problem if the author of the export code for a type of object can dump property "x" as <object-name>-x.<format> and property "y" as <object-name>-y.<format> For example instead of just: | -- foo_page.xml -- foo_page.dtml it might be useful to have arbitrary other files created when foo_page is dumped: | -- foo_page.xml -- foo_page.dtml -- foo_page-description.txt This would imply that the description property is not captured in the foo_page.xml,but instead the easier to use text file. I'm worried about the parsing complexity when trying to build a single "morally plain text" serialized format, and I think that "morally plain" can be applied at a sub-object level to make it easier to work with. The example that comes to mind it Image: | -- icon.xml -- icon.png -- icon-description.txt The "morally plain" output here would be binary, not text. A single output file would be hard pressed to allow binary and text editing in the same file. John -- . . . . . . . . . . . . . . . . . . . . . . . . John D. Heintz | Senior Engineer 1016 La Posada Dr. | Suite 240 | Austin TX 78752 T 512.633.1198 | jheintz@isogen.com w w w . d a t a c h a n n e l . c o m