I hadn't thought of the issues you raise. Thanks for mentioning them. "John D. Heintz" wrote in part:
If we standardize "properties" to an XML file, then optionally dump other files to expose specific aspects of an instance for serialized editing it might not be as big a problem as I was thinking.
I think that is the shared vision. Some aspects of each object could be serialized into a format that is easy to edit. For those aspects we leave it up to the developer of the object to write a serialization method -- we don't try to guess what an "easy to use" format would look like. Other aspects of objects might be impossible to serialize into a meaningful format. For those we have a default like XML pickle -- essentially a black box.
I guess I would suggest that the serialized form of a Zope instance by default would be a single XML file, but that arbitrary sections of that XML file could be custom dumped to separate serialized files with similiar names. That way authors would have a pretty easy job of overriding sections of the dump process to spit out one or more simple files that have little parsing overhead.
Sounds reasonable.
2) A lesser problem is when trying to edit the serialized "files". Because objects are methods and state how you modify an object can be guided if not controlled. When we have serialized the objects in a Zope system to files, we have exported only the state of the objects in the ZODB. We then have to live with the ability to foul up invariant across many objects by changing some data in the serialized format. A good example would be ZCatalogs. [...]
Yup... it's probably easiest to make ZCatalogs a black box.
Black box doesn't solve this problem, only the first one. Imagine that I move a serialized version of a Zope object that is indexed by an instance of ZCatalog (or many for that matter). When I move it the ZCatalogs must be notified to handle the change, but only at import time because ZCatalogs are serialized as binary for lots of good reasons.
I see the problem. I think the example you give can be handled adequately at import time. But I can see other examples where allowing edits to the serialized representation could create problems that would be impossible to resolve at import. So it seems like we might want to make some things read only. That is, when you serialize the objects in the Zope ODB to a filesystem, some of those serialized files are read-only "black boxes". A comment in those files could let a developer know that to change the information in that file she needs to do an import, or edit the ODB directly.
When I import the object from the serialized format all I can know is that something changed, but without expensive processing (XML diffing is hard in the general case, we might be able to limit the structures to managable scope though) we can't know that the "foo" ZCatalog should be updated instead of the "bar" ZCatalog.
Seems like we will need to consider the import code very carefully. I don't know enough about how ZCatalog works to discuss the options intelligently. But in other indexing systems I have worked with, there have been solutions for reindexing when making updates to the corpus.
a) XML is structured enough that it can reliably hold the data from the ZODB. The current XML dump is not useful for this - it would need to create individual files and folders to represent containment.
This is pretty easy right now. Ten lines of recursive code can walk the whole tree if necessary and export only leaf objects.
Great. Maybe I am closer than I realize to the CVS management solution. I need to look more closely at ZCVSmixin to see what it does. But for our immediate need (which is to allow a distributed team of developers to share code and track changes via a central CVS repository), maybe it makes the most sense just to segment the existing XML export into directories and files and enhance the existing import to allow overwriting objects.
b) A hybrid XML and custom dump solution. An Image for example could dump out as a binary image file with meta-data in a similiarly name XML file.
Yes, each object should make its own policy regarding its body. Its metadata format should be standardized, however.
I like this idea. After I have the XML export/import working in a way that fits better with CVS (even if the sreialized representation is essentially a black box), then I can tackle how each object represents its body in a "morally plain text" serialized format. In other words, first get the default XML representation and export/import working for all objects. Then start with the easiest type of objects to serialize (such as DTML Methods) and create an easy to use serialization representation. Then work on the import for that serialized format. I think this approach would be different than FSDump and ZCVSMixin, right? As far as I understand it, FSDump just goes one way (ZODB -> filesystem) and only for certain types of objects. I don't understand what ZCVSMixin does (will need to spend some time looking at it -- unlike FSDump, ZCVSMixin is not obvious from the documentation and a quick review). Thanks for helping with this project! Fred -- Fred Wilson Horch mailto:fhorch@ecoaccess.org Executive Director, EcoAccess http://ecoaccess.org/ P.O. Box 2823, Durham, NC 27715-2823 phone: 919.419-8354