Good summary of the current discussion, Shane. At the end, you say that we shouldn't assume you know what you're talking about. My version of that is that I assume I don't know what I'm talking about yet.
Jeremy posted some code that started to look like the right way to create relations in ZODB.
http://mail.zope.org/pipermail/zope3-dev/2003-April/006720.html
Here are the important features that made it interesting:
- You describe relations in the same place you write classes. The great thing about an object-oriented database is that you can get so much done just by writing classes. But in the current ZODB, as soon as you need flexible relationships, you have to move into a totally different sphere, such as creating a ZCatalog or some kind of relationship service. It shouldn't be that way. Python is expressive enough.
This was the primary goal for me. The implementation of a relationship may be complicated, but I think the client code should be kept as simple as possible. The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code - not in reams of trivial code that bores the reader to death. -- Guido van Rossum
- Descriptors elegantly provide custom views on relations. In the example, "zope3.developers" looks like a set of Developer objects.
Descriptors can do anything! I blogged a little about this on Friday: http://www.python.org/~jeremy/weblog/
- All the implementation details, such as use of BTrees, was moved away from the application code. To me, this means that the default relation implementation could be substituted depending on the capabilities of the storage method. Ape, for example, could implement relational queries by translating to SQL.
I'm not quite sure what all the implementation details are. Can you say more about how you would implement relations in Ape? The simple relationship manager I wrote uses a dictionary. I can see wanting some other data structure when the objects aren't hashable in a useful. And I can see using some BTree data structure when the individual relationships involve many objects. A relational database model seems quite different, because the database stores all the relationships for instances of those classes, rather than a single set of objects.
Unfortunately, I didn't like Jeremy's revisions quite as much. The revised version creates two Relation objects instead of one. Maybe I just don't understand it yet, but it doesn't fit my brain. ;-)
Perhaps some rationale is in order. A descriptor lives in a class dictionary, so it needs to be declared in the class statement rather than on the instance. It's possible to add the descriptor after the class is created, but I really don't want to do that. I like that the attribute name gets declared as a relationship in the class statement. The chief difference between the vapor version and the implemented version is that the vapor version had a single Relation object that was bound to both instances and the implemented version was two Relation descriptors that get joined together. The latest CVS version looks like this: class SoftwareProject(object): developers = Relation() def __init__(self, name): self.name = name class Developer(object): projects = Relation() def __init__(self, name): self.name = name join(many(SoftwareProject.developers), many(Developer.projects)) Does this version of the API look any better?
I prefer the notion of having two views on a single relation. I also feel that having a many2many function might be oversimplifying, since I came up with the need for a "many2many2many" function over the weekend. That would be wrong!
I'm not actually sure if we need two different descriptors. I guess we can have one descriptor that dispatches based on the class it was bound to. The separate descriptors may be a result of keeping the implementation simple at the expense of the clients. Regardless of how it's spelled, though, there is a bit of necesary complexity that comes from doing this in class statements. The Relation objects need to be created before the classes are created. There needs to be a call that tells the Relation about all the classes that participate in the Relation. (Maybe it could be done when the classes are created via a custom metaclass, but that seems to messy.) Can you post a simple example of many2many2many? It would surely be simpler to spell with the join() function above.
We need to make sure the interface fits an existing, well-researched model for relationships. I only know about relational tables, topic maps, and RDF.
I don't know much about any of these. From what little I know of RDF, it seems an example to avoid for this work. I've never heard of "topic maps." I know that the ODMG object database standard has binary relationships, that is relationships between pairs of objects. I don't really understand how an object database extends to relationships among many objects, since a pointer just points to one thing. I'd be quite interested to see how a 3-way relationship worked in ZODB.
Max M: your example is useful and probably more manageable than ZCatalogs. But I think it would be more useful if it provided an easy way to create relations in the code itself. You only have a comment that says the relation already exists. Jeremy's example creates the relation if it doesn't already exist, although it's only a basic relation. You example would also be enhanced by the use of descriptors.
Not sure if I follow this example completely -- assuming you mean the example code with deposit() and withdraw() methods. I assume the particular example of moving an object would use the object hub in Zope3. But it sounds like this relationship is a transient one; the use case being addressed is just preserving a link when an object changes location. Jeremy