Re: [Zope-dev] FTP interface being worked on?

19 Mar 2001

      Fred Wilson Horch wrote:
...
I hadn't thought of the issues you raise.  Thanks for mentioning them.
These are issues that may very well affect everyone and I'm happy to 
share my thoughts.
...
...
I guess I would suggest that the serialized form of a Zope instance by
default would be a single XML file, but that arbitrary sections of that
XML file could be custom dumped to separate serialized files with
similiar names.  That way authors would have a pretty easy job of
overriding sections of the dump process to spit out one or more simple
files that have little parsing overhead.
Sounds reasonable.
...
...
...
2) A lesser problem is when trying to edit the serialized "files".
Because objects are methods and state how you modify an object can
be guided if not controlled.  When we have serialized the
objects in a Zope system to files, we have exported only the state
of the objects in the ZODB.  We then have to live with the ability
to foul up invariant across many objects by changing some data in
the serialized format.  A good example would be ZCatalogs. [...]
Yup... it's probably easiest to make ZCatalogs a black box.
Black box doesn't solve this problem, only the first one.  Imagine that
I move a serialized version of a Zope object that is indexed by an
instance of ZCatalog (or many for that matter).  When I move it the
ZCatalogs must be notified to handle the change, but only at import time
because ZCatalogs are serialized as binary for lots of good reasons.
I see the problem.  I think the example you give can be handled
adequately at import time.
But I can see other examples where allowing edits to the serialized
representation could create problems that would be impossible to resolve
at import.
So it seems like we might want to make some things read only.  That is,
when you serialize the objects in the Zope ODB to a filesystem, some of
those serialized files are read-only "black boxes".  A comment in those
files could let a developer know that to change the information in that
file she needs to do an import, or edit the ODB directly.
I'm not sure that in the most general case this would solve the problem 
either.  :-(  How do we know when the value (or rather the change in 
value) of a property for some Zope object should trigger some method? 
It depends not only on the object itself, but possibly on many other 
objects.  This is the general problem of separating an objects state 
from its methods.  This is also equivalent to RDBMS triggers and 
referential integrity.

A pretty good example of this would be a Zope Product that provided 
Lamps and Switches.  Several lamps instances could be tied to a single 
switch instance.  When the switch is on, the lamps need to be also.  If 
I dump this to CVS then I can change the lamps and switches data 
separately.  Should all the property values for a lamp be read-only? 
Even the description property?

I understand that the kinds of objects you are working on this for don't 
have many of these problems, and that a very useful system could be 
built given the 80/20 rule.  I'm bringing this up to make sure we know 
what the other 20 means.
...
...
When I import the object
from the serialized format all I can know is that something changed, but
without expensive processing (XML diffing is hard in the general case,
we might be able to limit the structures to managable scope though) we
can't know that the "foo" ZCatalog should be updated instead of the
"bar" ZCatalog.
Seems like we will need to consider the import code very carefully.
I don't know enough about how ZCatalog works to discuss the options
intelligently.  But in other indexing systems I have worked with, there
have been solutions for reindexing when making updates to the corpus.
As I understand it, the issue with ZCatalog is a good example because of 
the separation of concerns.  A Catalog with indexes that contain Brains 
to get to the actual objects, a Controller that calls reindex/unindex, 
and the objects themselves that don't know they are cataloged.  When I'm 
editing the property "x" of some object "Y" it can be very hard to know 
that it is indexed in some Catalog.  Because it is hard to know I might 
have a difficult time deciding what should be read-only or when doing 
the import of "Y" that I need to call update on some other controller 
object to ensure that the indexes get updated.
...
...
...
...
a) XML is structured enough that it can reliably hold the
data from the
ZODB.  The current XML dump is not useful for this - it
would need to
create individual files and folders to represent
containment.
This is pretty easy right now.  Ten lines of recursive code
can walk the whole tree if necessary and export only leaf
objects.
Great.  Maybe I am closer than I realize to the CVS management
solution.  I need to look more closely at ZCVSmixin to see what it
does.  But for our immediate need (which is to allow a distributed team
of developers to share code and track changes via a central CVS
repository), maybe it makes the most sense just to segment the existing
XML export into directories and files and enhance the existing import to
allow overwriting objects.
...
...
...
b) A hybrid XML and custom dump solution.  An Image for
example could dump out as a binary image file with meta-data in a
similiarly name XML file.
Yes, each object should make its own policy regarding its
body.  Its metadata format should be standardized, however.
I like this idea.
After I have the XML export/import working in a way that fits better
with CVS (even if the sreialized representation is essentially a black
box), then I can tackle how each object represents its body in a
"morally plain text" serialized format.
I want to add here that it may be very useful to not specify that any 
object have only one serial format separate from the XML default.

Specifically it might make it an easier problem if the author of the 
export code for a type of object can dump property "x" as 
<object-name>-x.<format> and property "y" as <object-name>-y.<format>

For example instead of just:
|
-- foo_page.xml
-- foo_page.dtml

it might be useful to have arbitrary other files created when foo_page 
is dumped:
|
-- foo_page.xml
-- foo_page.dtml
-- foo_page-description.txt

This would imply that the description property is not captured in the 
foo_page.xml,but instead the easier to use text file.

I'm worried about the parsing complexity when trying to build a single 
"morally plain text" serialized format, and I think that "morally plain" 
can be applied at a sub-object level to make it easier to work with.

The example that comes to mind it Image:
|
-- icon.xml
-- icon.png
-- icon-description.txt

The "morally plain" output here would be binary, not text.  A single 
output file would be hard pressed to allow binary and text editing in 
the same file.

John

-- 
. . . . . . . . . . . . . . . . . . . . . . . .

John D. Heintz | Senior Engineer

1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.633.1198 | jheintz@isogen.com

w w w . d a t a c h a n n e l . c o m