RFC: RelationAware class for relations between objects
There has been a lot of discussion about the need for a service that manages relations between objects on zope3-dev lately and in the past. I thought this would be a good time to share some code we have written to make relations a bit easier in Zope 2 and to invite some comments on it. The attached module provides a mixin class that collaborates with Max M's mxmRelations product almost like CatalogAwareness collaborates with the ZCatalog. I really hope that this can be an acceptable interim solution until relations are better managed by the ZODB or by some service in Zope 3. One area of contention is the overriding of __of__ to compute relations as attributes on objects. What kind of performance hit will this cause if one has a long chain of relations? I appreciate any comments. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
roche@upfrontsystems.co.za wrote:
There has been a lot of discussion about the need for a service that manages relations between objects on zope3-dev lately and in the past. I thought this would be a good time to share some code we have written to make relations a bit easier in Zope 2 and to invite some comments on it. The attached module provides a mixin class that collaborates with Max M's mxmRelations product almost like CatalogAwareness collaborates with the ZCatalog. I really hope that this can be an acceptable interim solution until relations are better managed by the ZODB or by some service in Zope 3.
I'm going to take this opportunity to move the discussion here instead of zope3-dev. Relations are important for any sizable application. That means to me that support for relations should not require the Zope application. It seems like relations shouldn't even require acquisition, context wrapping, or a component architecture. Currently, Zope 2 fulfills most needs for relations using ZCatalog, but ZCatalog is a lot to swallow. Jeremy posted some code that started to look like the right way to create relations in ZODB. http://mail.zope.org/pipermail/zope3-dev/2003-April/006720.html Here are the important features that made it interesting: - You describe relations in the same place you write classes. The great thing about an object-oriented database is that you can get so much done just by writing classes. But in the current ZODB, as soon as you need flexible relationships, you have to move into a totally different sphere, such as creating a ZCatalog or some kind of relationship service. It shouldn't be that way. Python is expressive enough. - Descriptors elegantly provide custom views on relations. In the example, "zope3.developers" looks like a set of Developer objects. - All the implementation details, such as use of BTrees, was moved away from the application code. To me, this means that the default relation implementation could be substituted depending on the capabilities of the storage method. Ape, for example, could implement relational queries by translating to SQL. Unfortunately, I didn't like Jeremy's revisions quite as much. The revised version creates two Relation objects instead of one. Maybe I just don't understand it yet, but it doesn't fit my brain. ;-) I prefer the notion of having two views on a single relation. I also feel that having a many2many function might be oversimplifying, since I came up with the need for a "many2many2many" function over the weekend. That would be wrong! We need to make sure the interface fits an existing, well-researched model for relationships. I only know about relational tables, topic maps, and RDF. Max M: your example is useful and probably more manageable than ZCatalogs. But I think it would be more useful if it provided an easy way to create relations in the code itself. You only have a comment that says the relation already exists. Jeremy's example creates the relation if it doesn't already exist, although it's only a basic relation. You example would also be enhanced by the use of descriptors. Roche: your example is purely implementation, although the ideas look interesting. Try writing something similar to Jeremy's example to evaluate the simplicity of your approach. The current Zope 3 plan is to put all relation management in a service. As I see it, that is one possible implementation of Jeremy's relation() function. Also, I would be a little disappointed if creating relations always required writing XML-based configuration documents or visiting something in the management interface. Instead, a little bit of Python code ought to be sufficient. Now, don't assume I really know what I'm talking about! I'm not a database administrator, and I'm evaluating the proposed ideas both objectively and subjectively. If nobody disagrees with me then I'll be afraid that nobody agrees, either. ;-) Shane
* Shane Hathaway <shane@zope.com> [2003-04-28 17:09]:
roche@upfrontsystems.co.za wrote:
There has been a lot of discussion about the need for a service that manages relations between objects on zope3-dev lately and in the past. I thought this would be a good time to share some code we have written to make relations a bit easier in Zope 2 and to invite some comments on it. The attached module provides a mixin class that collaborates with Max M's mxmRelations product almost like CatalogAwareness collaborates with the ZCatalog. I really hope that this can be an acceptable interim solution until relations are better managed by the ZODB or by some service in Zope 3.
I'm going to take this opportunity to move the discussion here instead of zope3-dev.
Great! Thanks for engaging this topic.
Relations are important for any sizable application. That means to me that support for relations should not require the Zope application. It seems like relations shouldn't even require acquisition, context wrapping, or a component architecture. Currently, Zope 2 fulfills most needs for relations using ZCatalog, but ZCatalog is a lot to swallow.
Jeremy posted some code that started to look like the right way to create relations in ZODB.
I also think relations shouldn't require Zope and Jeremy sure shows that it can be done in pure python. What I appreciate about this is that it might provide a solution that might be workable with Zope 2 in the interim and Zope 3 later on.
http://mail.zope.org/pipermail/zope3-dev/2003-April/006720.html
Here are the important features that made it interesting:
- You describe relations in the same place you write classes. The great thing about an object-oriented database is that you can get so much done just by writing classes. But in the current ZODB, as soon as you need flexible relationships, you have to move into a totally different sphere, such as creating a ZCatalog or some kind of relationship service. It shouldn't be that way. Python is expressive enough.
Absolutely! It is so natural in Python to assign an object to an attribute of another object and reference that object later like any other attribute. We basically just want to make the underlying storage cope with this. There is one kind of relationship that already has these natural qualities in the ZODB and that is containment. When one object contains other objects it is already possible to say Container.Content.Attribute. I mention this purely because it might proof useful to keep this in mind when we think of other relationships.
- Descriptors elegantly provide custom views on relations. In the example, "zope3.developers" looks like a set of Developer objects.
I haven't worked with descriptors before and forgot what they do. I reread AMK's, "What's new in Python 2.2" to help me understand what it is that Jeremy was doing. I must say that this is the kind of feature that makes me a jubilant python developer.
- All the implementation details, such as use of BTrees, was moved away from the application code. To me, this means that the default relation implementation could be substituted depending on the capabilities of the storage method. Ape, for example, could implement relational queries by translating to SQL.
Unfortunately, I didn't like Jeremy's revisions quite as much. The revised version creates two Relation objects instead of one. Maybe I just don't understand it yet, but it doesn't fit my brain. ;-) I prefer the notion of having two views on a single relation.
I agree. I was reluctant to specify relations between classes in the class itself but the idea is growing on me *because* it is natural in python: objectX.attr = objectY. I don't think that objects on both ends of the relationship should necessarily know that they are related to. In Jeremy's example this seems to be a requirement. In objextX.attr = objectY, objectY doesn't know it is related two.
I also feel that having a many2many function might be oversimplifying, since I came up with the need for a "many2many2many" function over the weekend. That would be wrong!
Won't this just be a chain of relations? You have to explain what you came up with.
We need to make sure the interface fits an existing, well-researched model for relationships. I only know about relational tables, topic maps, and RDF.
I'd also like to know what models exist. I am not absolutely clear on what the problem statement is. It seems that the problem lies with the underlying persistence framework and not with relationships between objects. The reason I say this is because we already know a natural way to "relate" objects in pure python.
Max M: your example is useful and probably more manageable than ZCatalogs. But I think it would be more useful if it provided an easy way to create relations in the code itself. You only have a comment that says the relation already exists. Jeremy's example creates the relation if it doesn't already exist, although it's only a basic relation. You example would also be enhanced by the use of descriptors.
Roche: your example is purely implementation, although the ideas look interesting. Try writing something similar to Jeremy's example to evaluate the simplicity of your approach.
I have one requirement though and that is that it should work with Zope 2 (not require it). Zope 2 apps are my bread and butter at the moment but for now I am going to assume that Zope 2 works fine with python 2.2 and that a descriptor based solution is the way to go.
The current Zope 3 plan is to put all relation management in a service. As I see it, that is one possible implementation of Jeremy's relation() function. Also, I would be a little disappointed if creating relations always required writing XML-based configuration documents or visiting something in the management interface. Instead, a little bit of Python code ought to be sufficient.
Now, don't assume I really know what I'm talking about!
That makes two of us but at least if we don't know what we're talking about nobody else will either and they can't blame us if we mess up and they might even like us we come up with something good ;-)
I'm not a database administrator, and I'm evaluating the proposed ideas both objectively and subjectively. If nobody disagrees with me then I'll be afraid that nobody agrees, either. ;-)
I just have this itch and will bet on it that proper handling of relationships will save tons of code. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
roche@upfrontsystems.co.za wrote:
* Shane Hathaway <shane@zope.com> [2003-04-28 17:09]: I also think relations shouldn't require Zope and Jeremy sure shows that it can be done in pure python. What I appreciate about this is that it might provide a solution that might be workable with Zope 2 in the interim and Zope 3 later on.
If we could make this work with Zope 2 and 3 context wrapping somehow--i.e., be able to get a related object in context--then this would be the relationship tool that we need. Unless we can do that, I only see this model applicable in Zope if we are using a repository-type system--not the standard pattern. ...
I have one requirement though and that is that it should work with Zope 2 (not require it). Zope 2 apps are my bread and butter at the moment but for now I am going to assume that Zope 2 works fine with python 2.2 and that a descriptor based solution is the way to go.
I don't think objects with these descriptors can be stored in ZODB 3, though. Right now this is a Zope 3-only (ZODB 4-only) prospect, AFAIK. If anybody can shoot that down, it's Jeremy, though. ;-)
The current Zope 3 plan is to put all relation management in a service. As I see it, that is one possible implementation of Jeremy's relation() function. Also, I would be a little disappointed if creating relations always required writing XML-based configuration documents or visiting something in the management interface. Instead, a little bit of Python code ought to be sufficient.
I agree that having something easy to use is very attractive. :-) I don't know if we're on the same page as far as creating relation types: I think setting up the possibility, say, of an "authored" relation between IBook and IContact should be an explicit configuration step--somewhere, maybe in code but maybe not--that enforces the fact that doing so is a non-trivial operation, data-wise. *Using* that relation should be trivial and easy. But if Jeremy just does what he wants and it works well enough for what we need, hey, I'm happy. Gary
On Mon, 2003-04-28 at 13:44, Gary Poster wrote:
roche@upfrontsystems.co.za wrote:
* Shane Hathaway <shane@zope.com> [2003-04-28 17:09]: I also think relations shouldn't require Zope and Jeremy sure shows that it can be done in pure python. What I appreciate about this is that it might provide a solution that might be workable with Zope 2 in the interim and Zope 3 later on.
If we could make this work with Zope 2 and 3 context wrapping somehow--i.e., be able to get a related object in context--then this would be the relationship tool that we need. Unless we can do that, I only see this model applicable in Zope if we are using a repository-type system--not the standard pattern.
If I understand you, ZODB-level relations are not sufficient. The problem is that Zope doesn't care about relations between objects based on their identity. Zope cares about relations between objects based on the paths used to get to those objects; whatever object you find at a particular path location should be involved in the relation.
I have one requirement though and that is that it should work with Zope 2 (not require it). Zope 2 apps are my bread and butter at the moment but for now I am going to assume that Zope 2 works fine with python 2.2 and that a descriptor based solution is the way to go.
I don't think objects with these descriptors can be stored in ZODB 3, though. Right now this is a Zope 3-only (ZODB 4-only) prospect, AFAIK. If anybody can shoot that down, it's Jeremy, though. ;-)
We'd have to teach ExtensionClass about descriptors. I suppose that's tractable, but hardly worth the effort. Jim suggested rewriting ExtensionClass in Python as a new-style metaclass. That's for some future version of Zope 2, but sounds promising to me. Jeremy
roche@upfrontsystems.co.za wrote:
* Shane Hathaway <shane@zope.com> [2003-04-28 17:09]:
- Descriptors elegantly provide custom views on relations. In the example, "zope3.developers" looks like a set of Developer objects.
I haven't worked with descriptors before and forgot what they do. I reread AMK's, "What's new in Python 2.2" to help me understand what it is that Jeremy was doing. I must say that this is the kind of feature that makes me a jubilant python developer.
Descriptors are like the ExtensionClass computed attribute feature. Like ExtensionClass, a descriptor determines what happens when you get a particular attribute. Unlike ExtensionClass, a descriptor also determines what happens when you set an attribute. It's a nice enhancement.
I also feel that having a many2many function might be oversimplifying, since I came up with the need for a "many2many2many" function over the weekend. That would be wrong!
Won't this just be a chain of relations? You have to explain what you came up with.
Well, I started writing a student registration system. First I wrote it something like this (shortened for simplicity): class Student: courses = Relation() class Course: students = Relation() rel = many2many(Student.courses, Course.students) Then I decided that registrations are really only temporary: a student should be assigned to a particular course only in a particular term (a term is a quarter or a semester), not all terms. I tried something like the following. As you can probably guess, I got lost. ;-) class Term: courses = Relation() students = Relation() class Student: current_courses = Relation() course_history = Relation() class Course: current_students = Relation() student_history = Relation() rel1 = many2many(Student.course_history, Course.student_history) rel2 = many2many(Student.current_courses, Course.current_students, Term.courses, Term.students) rel1 makes sense, but I don't really know what the statement that constructs rel2 does. For example, what kind of structure does it create? How will it know that "Term.students" should return Student objects? And what if I want to ask in which terms is a certain course available? I'd like to just ask "Course.terms", but it's hard to guess how to spell that. I'm going to suggest a solution in my reply to Jeremy.
We need to make sure the interface fits an existing, well-researched model for relationships. I only know about relational tables, topic maps, and RDF.
I'd also like to know what models exist. I am not absolutely clear on what the problem statement is. It seems that the problem lies with the underlying persistence framework and not with relationships between objects. The reason I say this is because we already know a natural way to "relate" objects in pure python.
I would state the problem this way: ZODB needs a reusable model for maintaining complex relationships between database objects. It also needs a basic implementation of that model.
Max M: your example is useful and probably more manageable than ZCatalogs. But I think it would be more useful if it provided an easy way to create relations in the code itself. You only have a comment that says the relation already exists. Jeremy's example creates the relation if it doesn't already exist, although it's only a basic relation. You example would also be enhanced by the use of descriptors.
Roche: your example is purely implementation, although the ideas look interesting. Try writing something similar to Jeremy's example to evaluate the simplicity of your approach.
I have one requirement though and that is that it should work with Zope 2 (not require it). Zope 2 apps are my bread and butter at the moment but for now I am going to assume that Zope 2 works fine with python 2.2 and that a descriptor based solution is the way to go.
The descriptors are icing on the cake, IMHO. You could still use ZODB-based relations without descriptors.
I just have this itch and will bet on it that proper handling of relationships will save tons of code.
+1 Relations will also make ZODB usable for more applications. Currently, a lot of people reject object-oriented databases as a way to build applications because of the difficulty of expressing complex relationships. Shane
Good summary of the current discussion, Shane. At the end, you say that we shouldn't assume you know what you're talking about. My version of that is that I assume I don't know what I'm talking about yet.
Jeremy posted some code that started to look like the right way to create relations in ZODB.
http://mail.zope.org/pipermail/zope3-dev/2003-April/006720.html
Here are the important features that made it interesting:
- You describe relations in the same place you write classes. The great thing about an object-oriented database is that you can get so much done just by writing classes. But in the current ZODB, as soon as you need flexible relationships, you have to move into a totally different sphere, such as creating a ZCatalog or some kind of relationship service. It shouldn't be that way. Python is expressive enough.
This was the primary goal for me. The implementation of a relationship may be complicated, but I think the client code should be kept as simple as possible. The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code - not in reams of trivial code that bores the reader to death. -- Guido van Rossum
- Descriptors elegantly provide custom views on relations. In the example, "zope3.developers" looks like a set of Developer objects.
Descriptors can do anything! I blogged a little about this on Friday: http://www.python.org/~jeremy/weblog/
- All the implementation details, such as use of BTrees, was moved away from the application code. To me, this means that the default relation implementation could be substituted depending on the capabilities of the storage method. Ape, for example, could implement relational queries by translating to SQL.
I'm not quite sure what all the implementation details are. Can you say more about how you would implement relations in Ape? The simple relationship manager I wrote uses a dictionary. I can see wanting some other data structure when the objects aren't hashable in a useful. And I can see using some BTree data structure when the individual relationships involve many objects. A relational database model seems quite different, because the database stores all the relationships for instances of those classes, rather than a single set of objects.
Unfortunately, I didn't like Jeremy's revisions quite as much. The revised version creates two Relation objects instead of one. Maybe I just don't understand it yet, but it doesn't fit my brain. ;-)
Perhaps some rationale is in order. A descriptor lives in a class dictionary, so it needs to be declared in the class statement rather than on the instance. It's possible to add the descriptor after the class is created, but I really don't want to do that. I like that the attribute name gets declared as a relationship in the class statement. The chief difference between the vapor version and the implemented version is that the vapor version had a single Relation object that was bound to both instances and the implemented version was two Relation descriptors that get joined together. The latest CVS version looks like this: class SoftwareProject(object): developers = Relation() def __init__(self, name): self.name = name class Developer(object): projects = Relation() def __init__(self, name): self.name = name join(many(SoftwareProject.developers), many(Developer.projects)) Does this version of the API look any better?
I prefer the notion of having two views on a single relation. I also feel that having a many2many function might be oversimplifying, since I came up with the need for a "many2many2many" function over the weekend. That would be wrong!
I'm not actually sure if we need two different descriptors. I guess we can have one descriptor that dispatches based on the class it was bound to. The separate descriptors may be a result of keeping the implementation simple at the expense of the clients. Regardless of how it's spelled, though, there is a bit of necesary complexity that comes from doing this in class statements. The Relation objects need to be created before the classes are created. There needs to be a call that tells the Relation about all the classes that participate in the Relation. (Maybe it could be done when the classes are created via a custom metaclass, but that seems to messy.) Can you post a simple example of many2many2many? It would surely be simpler to spell with the join() function above.
We need to make sure the interface fits an existing, well-researched model for relationships. I only know about relational tables, topic maps, and RDF.
I don't know much about any of these. From what little I know of RDF, it seems an example to avoid for this work. I've never heard of "topic maps." I know that the ODMG object database standard has binary relationships, that is relationships between pairs of objects. I don't really understand how an object database extends to relationships among many objects, since a pointer just points to one thing. I'd be quite interested to see how a 3-way relationship worked in ZODB.
Max M: your example is useful and probably more manageable than ZCatalogs. But I think it would be more useful if it provided an easy way to create relations in the code itself. You only have a comment that says the relation already exists. Jeremy's example creates the relation if it doesn't already exist, although it's only a basic relation. You example would also be enhanced by the use of descriptors.
Not sure if I follow this example completely -- assuming you mean the example code with deposit() and withdraw() methods. I assume the particular example of moving an object would use the object hub in Zope3. But it sounds like this relationship is a transient one; the use case being addressed is just preserving a link when an object changes location. Jeremy
* Jeremy Hylton <jeremy@zope.com> [2003-04-28 19:58]:
The chief difference between the vapor version and the implemented version is that the vapor version had a single Relation object that was bound to both instances and the implemented version was two Relation descriptors that get joined together. The latest CVS version looks like this:
class SoftwareProject(object):
developers = Relation()
def __init__(self, name): self.name = name
class Developer(object):
projects = Relation()
def __init__(self, name): self.name = name
join(many(SoftwareProject.developers), many(Developer.projects))
Does this version of the API look any better?
So far I really, really like the simplicity of it. Here are some questions i have regarding the current API: 1. How important is it that relations should be between specific class types. Can't one get away with just saying that a relation has a given cardinality? Why can't one relate instances of arbitrary classes to each other. Why worry about the class type if one has to be explicit when one relates objects? Here is an example of how a type constraint could be problematic: class Organisation(object): class Customer(Organisation): class Reseller(Organisation): class ContactPerson(object): Organisation = OneRelation() ZopeCorp = Customer() pete = ContactPerson() pete.Organisation.add(ZopeCorp) 2. When time permits would you fill in the detail for a one2one/one2many relation? I am interested to see if your api would allow direct assignment to a descriptor when one object only relates to one other object ie. class Developer: projects = Relation() team = Relation() class Team: developers = Relation() join(one(Developer.team), many(Team.developers)) stevea = Developer("Steve Alexander") ateam = Team("The A Team") stevea.team = ateam assert stevea in ateam.developers Btw, the state of the RelationDict is corrupted when one defines more than one Relation on a class. I know we are only talking about the API now, just thought I'd mention it. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
The Relation objects need to be created before the classes are created.
Not any more!
There needs to be a call that tells the Relation about all the classes that participate in the Relation. (Maybe it could be done when the classes are created via a custom metaclass, but that seems to messy.)
There's a way to allow a descriptor (or other thing defined inside a class suite) to do things to the class immediately after the class is created. Phillip Eby has checked code inspired by my prototype, in turn inspired by a conversation I had with GvR. (That's the credits out of the way :-) ): It is available from Phillip's PEAK project, and will probably be checked into Zope 3 soon also. http://cvs.eby-sarna.com/PEAK/src/peak/util/advice.py?rev=HEAD&content-type=... And here are some unit tests for it: http://cvs.eby-sarna.com/PEAK/src/peak/util/tests/advice.py?rev=HEAD&content... Using the API looks like this: def implements(*interfaces): def callback(klass): klass.__implements__ = interfaces return klass addClassAdvisor(callback) class Foo: implements(IFoo) This adds an __implements__ attribute to the class just after it has been created, but before any other code gets to see it. -- Steve Alexander
On Tue, 2003-04-29 at 09:59, Steve Alexander wrote:
There's a way to allow a descriptor (or other thing defined inside a class suite) to do things to the class immediately after the class is created.
Boy, that's a lot of mechanism. It would be nice if the relation support could be kept simple. Jeremy
Jeremy Hylton wrote:
- All the implementation details, such as use of BTrees, was moved away from the application code. To me, this means that the default relation implementation could be substituted depending on the capabilities of the storage method. Ape, for example, could implement relational queries by translating to SQL.
I'm not quite sure what all the implementation details are. Can you say more about how you would implement relations in Ape?
Basically, Ape would replace the implementation of the relation constructor and wire up relational queries directly to the database. Ape would probably not use BTrees at all. I should mention that Ape is the reason I'm especially interested in relations at the moment. We need a simple model that Ape can transparently replace with its own implementation.
class SoftwareProject(object):
developers = Relation()
def __init__(self, name): self.name = name
class Developer(object):
projects = Relation()
def __init__(self, name): self.name = name
join(many(SoftwareProject.developers), many(Developer.projects))
Does this version of the API look any better?
Okay, I see what you're saying now. There really are two relations. Adding to one relation causes a new entry in the other relation. What I see lacking is a clear way to expand this model. If I could map this onto a relational table, I'd know how to build indexes, create more complex relationships, etc. But this model looks unfamiliar, so I think we'll have to invent our own ways of expanding it. Maybe I just haven't thought about it long enough.
Can you post a simple example of many2many2many? It would surely be simpler to spell with the join() function above.
I posted my floundering attempt at many2many2many this morning in my reply to Roche. I tried to write a student registration system, where a student is registered in a particular course for a particular term/quarter/semester. It's hard to view such a system in terms of only Student->Course and Course->Student relationships. Instead, I view it as a database of Registration objects. Here is a sketch of the way I'd see it: class IRegistration (Interface): student = Attribute('student', 'A Student object') course = Attribute('course', 'A Course object') term = Attribute('term', 'A Term object') registrations = RelationBuilder(IRegistration) class Registration (Persistent): implements(IRegistration) def __init__(self, student, course, term): self.student = student self.course = course self.term = term class Term (Persistent): def __init__(self, name): self.name = name class Student (Persistent): current_courses = registrations.view( select='course', term='current') all_courses = registrations.view( select='course') class Course (Persistent): current_students = registrations.view( select='student', term='current') all_students = registrations.view( select='student') IRegistration would be better represented as a schema, but I don't have a schema syntax reference card handy. ;-) RelationBuilder creates a database table that looks quite similar to a table in a relational database, except that it refers to persistent objects rather than storing just strings and numbers.
We need to make sure the interface fits an existing, well-researched model for relationships. I only know about relational tables, topic maps, and RDF.
I don't know much about any of these. From what little I know of RDF, it seems an example to avoid for this work. I've never heard of "topic maps."
It's not directly relevant to this discussion, but here is where I learned about the relationship between relational databases, RDF, and topic maps: http://www.w3.org/DesignIssues/RDB-RDF.html http://www.ontopia.net/topicmaps/materials/tmrdfoildaml.html FWIW, I just discovered that W3C invented a way to map between topic maps and RDF, so that means I don't have to learn about topic maps. :-) http://www.w3.org/2002/06/09-RDF-topic-maps/
I know that the ODMG object database standard has binary relationships, that is relationships between pairs of objects. I don't really understand how an object database extends to relationships among many objects, since a pointer just points to one thing. I'd be quite interested to see how a 3-way relationship worked in ZODB.
The example I gave above hopefully shows how n-way relationships might work. If we implemented relationships that way, we could also provide shortcuts so that people can build binary relationships without having to write an interface. Shane
Hi guys, Just to get the terminology straight. This thread (and the one on zope3-dev) seems to be using the terms relation and relationship interchangeably, but they're not. A RELATIONSHIP is what we are talking about in this thread: a way to record (and eventually add data to) a link between two potentially different objects, whereas a RELATION is a set of similar objects. Mapping to RDB, a RELATION is a table, whereas a RELATIONSHIP is usually expressed as a join between two tables by the primary key in one of them and a foreing key in the other. In Zope and in Python we already have RELATIONs, by putting a bunch of similar objects in a list or as children of a Folder or ObjectManager. We still don't have a convenient way to express RELATIONSHIPs, and this thread looks very promising in this regard, but the APIs being discussed here are already calling RELATIONSHIPs RELATIONs and this worries me a little. I-only-know-enough-about-this-topic-to-annoy-people'ly yours Leo. -- Ideas don't stay in some minds very long because they don't like solitary confinement.
Leonardo Rochael Almeida wrote:
In Zope and in Python we already have RELATIONs, by putting a bunch of similar objects in a list or as children of a Folder or ObjectManager. We still don't have a convenient way to express RELATIONSHIPs, and this thread looks very promising in this regard, but the APIs being discussed here are already calling RELATIONSHIPs RELATIONs and this worries me a little.
I like that distinction. We should stick to it. To clarify, the API used by the example I posted formalizes relations with the intent of making relationships easy to create and discover. A goal that's important to me is the ability to infer new relationships from an existing data set, which is relatively easy to do with formal relations. Here is the example with some revisions to demonstrate more of what I have in mind: class IRegistration (Interface): student = Attribute('student', 'A Student object') course = Attribute('course', 'A Course object') term = Attribute('term', 'A Term object') registrations = RelationBuilder(IRegistration) current_registrations = registrations.view(filter=CurrentTermFilter()) class Registration (Persistent): implements(IRegistration) def __init__(self, student, course, term): self.student = student self.course = course self.term = term class Term (Persistent): def __init__(self, name): self.name = name class Student (Persistent): current_courses = current_registrations.view( match='student', select='course') all_courses = registrations.view( match='student', select='course') class Course (Persistent): current_students = current_registrations.view( match='course', select='student') all_students = registrations.view( match='course', select='student') *shrug* I know how to implement this, but I don't know if it's a good idea. Thoughts? Shane
On Tue, 29 Apr 2003 14:28:10 -0400 Shane Hathaway <shane@zope.com> wrote:
Leonardo Rochael Almeida wrote:
In Zope and in Python we already have RELATIONs, by putting a bunch of similar objects in a list or as children of a Folder or ObjectManager. We still don't have a convenient way to express RELATIONSHIPs, and this thread looks very promising in this regard, but the APIs being discussed here are already calling RELATIONSHIPs RELATIONs and this worries me a little.
I like that distinction. We should stick to it.
To clarify, the API used by the example I posted formalizes relations with the intent of making relationships easy to create and discover. A goal that's important to me is the ability to infer new relationships from an existing data set, which is relatively easy to do with formal relations. Here is the example with some revisions to demonstrate more of what I have in mind:
class IRegistration (Interface): student = Attribute('student', 'A Student object') course = Attribute('course', 'A Course object') term = Attribute('term', 'A Term object')
registrations = RelationBuilder(IRegistration) current_registrations = registrations.view(filter=CurrentTermFilter())
class Registration (Persistent): implements(IRegistration) def __init__(self, student, course, term): self.student = student self.course = course self.term = term
class Term (Persistent): def __init__(self, name): self.name = name
class Student (Persistent): current_courses = current_registrations.view( match='student', select='course') all_courses = registrations.view( match='student', select='course')
class Course (Persistent): current_students = current_registrations.view( match='course', select='student') all_students = registrations.view( match='course', select='student')
*shrug* I know how to implement this, but I don't know if it's a good idea. Thoughts?
Just to help me understand I am going to first say what I see here. In this API, relations are defined outside classes (registrations) whereas the relationships are defined in the class (Course.current_students, etc). Jeremy's API does not formalise relations, it only provides methods to build relationships between objects. Ok, I think I understand the difference between relations and relationships now ;-) When one formalises a relation ie. that of 'registrations' it does not require any defined classes. This is big plus in my mind because: - relations are described in a way that does not require knowledge of any classes or how classes will use them to build relationships. - many *relationships* (current_courses, all_courses) can be specified on a single class in the form of static attributes using different views of a single *relation* (registrations). - one class can have a relationship with another class without the other class knowing about it. This is usefull if you want to relate to classes that are not under you control. Shane, show us more :-) You've *defined* certain relationships but have not shown how to *create* them and how to use them. Will this be possible: me = Student() zope101 = Course("Introduction to Zope") zope101.current_students.add(me) assert me.current_courses.zope101.name == "Introduction to Zope" I think we should make a list of requirements and use cases - at the moment it seems as if we are all pointing at a mountain but not the same one. It is difficult to evaluate API's if we don't have requirements. I'll try to come up with a list this afternoon. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Roché Compaan wrote:
Shane, show us more :-) You've *defined* certain relationships but have not shown how to *create* them and how to use them. Will this be possible:
me = Student()
zope101 = Course("Introduction to Zope") zope101.current_students.add(me) assert me.current_courses.zope101.name == "Introduction to Zope"
That's the kind of thing I was hoping to achieve, yes.
I think we should make a list of requirements and use cases - at the moment it seems as if we are all pointing at a mountain but not the same one. It is difficult to evaluate API's if we don't have requirements. I'll try to come up with a list this afternoon.
Now that we've thrown in a few ideas, I think I know what I'm looking for. The goal is this: ZODB needs a reusable model for maintaining complex relationships between database objects. ZODB also needs a basic implementation of that model. The solution must provide developers with ways to create relationships using only a few lines of Python code. The fewer lines the better (but not too few.) Visiting a management interface or writing a configuration file should not be necessary for creating most relationships. This requirement is especially designed to meet the needs of developers inexperienced with ZODB who need an easy way to maintain many-to-many relationships. The solution must provide a very Pythonic way to access and maintain relationships. If you have a SoftwareProject object, for example, you should be able to ask the SoftwareProject for the set of all Developer objects who are working on that project, even though the SoftwareProject does not "contain" the Developers. You should be able to add a Developer to that set and expect any inverse relationships to be updated automatically. The solution must provide a way to make indirect references rather than direct references to ZODB objects. Maintaining complex relationships is important for both simple ZODB applications and Zope applications. In standard ZODB it is often preferable to make direct references to related objects, while Zope 3 requires the ability to maintain relationships using an object hub rather than direct references. It must be possible to implement the solution using either BTrees in a standard ZODB or a relational database. This is designed to allow users to develop applications using a simple FileStorage and deploy using a relational backend such as Ape. The solution must allow complex relationships. A relationship may include metadata. Sean Upton talked about this. The solution must be built on a well-known model for maintaining relationships. If we invent a new model that is hard to map onto any existing model, it will be difficult to know the right way to expand the model as requirements grow. I *think* the solution should allow new relationships to be inferred from existing relationships. For example, if I build an application that assigns students to courses using this solution, I might later need to ask for all courses assigned to a student. Ideally, adding that functionality should not require a database migration step. Do you agree with these requirements and minimal use cases? Is there anything I need to clarify? Shane
Do you agree with these requirements and minimal use cases? Is there anything I need to clarify?
What about making relationships among pre-existing objects that were not designed with relationships in mind? -- Steve Alexander
Steve Alexander wrote:
Do you agree with these requirements and minimal use cases? Is there anything I need to clarify?
What about making relationships among pre-existing objects that were not designed with relationships in mind?
Ah yes, that is important for Zope, especially Zope 3. Let me see if I can invent a use case. You have a bunch of Job objects in a JobBoard. As it stands, Jobs have no relationships with Users, but now you want to relate Jobs to Users. You want to ask some component for all Users who have indicated interest in a particular Job. You also want to ask some component for all Jobs that are particular User is interested in. You want to do this without modifying the code for Job or User, since you do not have control over either codebase. Does that use case capture your intent? Shane
Shane Hathaway wrote:
Steve Alexander wrote:
What about making relationships among pre-existing objects that were not designed with relationships in mind?
As it stands, Jobs have no relationships with Users, but now you want to relate Jobs to Users.
I'd say that this could be a fairly common use case. Here's my take on what Relationships should be, using ERM jargon (see: http://www.cs.jcu.edu.au/ftp/web/teaching/Subjects/cp1500/1998/Lecture_Notes...) A relation is a mapping from roles to entities. These entities don't need to know anything about the relation in order to be a part of it. The relation may also have descriptive data attached to it. A Relationship is an object that contains a set of relations of the same type, meaning that each relation is based on the same set of roles. It provides methods for searching and modifying this set. The Relationship also imposes constraints on and among the relations that it contains. A role does not impose class or other restrictions on the entities that may fill it, although the Relationship may. It is possible to derive a "view" of a Relationship, such that it contains the subset of relations in which one or more roles contain specified entities. Any relation added to the view must map these roles to these entities. In concrete terms:
r = Relationship(roles=[['group'], ['member']]) r.add(group='men', member='fred') <relation group='men' member='fred'> r.add('men', 'barney') # implicit use of role order? <relation group='men' member='barney'> fred_groups = r.view(member='fred') fred_groups.add(group='water buffalo') for rel in fred_groups.get(): print rel.group 'men' 'water buffalo' print fred_groups.get(group='men') <relation group='men' member='fred'> len(r), len(fred_groups) 3, 2 r.remove(member='fred') len(r), len(fred_groups) 1, 0
The various common sorts of cardinality (one-to-one, etc) constraints can be expressed by list nesting. The example above creates a many-to-many relationship. # one-to-one Relationship(['person', 'ssn']) # one-to-many Relationship(['boss', ['employee']]) # one-to-many-to-many Relationship(['book', ['edition'], ['editor']]) # one-to-one-to-many Relationship(['mother', 'father', ['child']]) # one-to-(one-to-many) Relationship(['superclass', ['class', ['instance']]]) Notice the subtle but important difference between the last two examples. A child must have exactly one (mother, father) pair, and an instance must have exactly one (superclass, class) pair. Also, a class must have exactly one superclass (we're talking single-inheritance, here), but a father may have children with more than one mother and vice-versa. Attaching descriptive data might look like this: r = Relationship(['invoice', 'payment']) link = r.add(invoice=inv1, payment=p1) link['amount'] = 50 link = r.add(invoice=inv2, payment=p1) link['amount'] = 75 This facility could also be used to "annotate" objects, with a single-role relation: member_data = Relationship(['member']) member_data.add(current_user)['email'] = 'guy@example.com' It would probably be valuable to allow arbitrary constraint objects that are notified of attempts to add and remove relations. Cheers, Evan @ 4-am
On Wed, 30 Apr 2003, Evan Simpson wrote:
Shane Hathaway wrote:
Steve Alexander wrote:
What about making relationships among pre-existing objects that were not designed with relationships in mind?
As it stands, Jobs have no relationships with Users, but now you want to relate Jobs to Users.
I'd say that this could be a fairly common use case. Here's my take on what Relationships should be, using ERM jargon (see: http://www.cs.jcu.edu.au/ftp/web/teaching/Subjects/cp1500/1998/Lecture_Notes...)
A relation is a mapping from roles to entities. These entities don't need to know anything about the relation in order to be a part of it. The relation may also have descriptive data attached to it.
A Relationship is an object that contains a set of relations of the same type, meaning that each relation is based on the same set of roles. It provides methods for searching and modifying this set. The Relationship also imposes constraints on and among the relations that it contains.
Well, Evan, I'm having a hard time interpreting the paper you referenced that way. The paper seems to use "relation" and "relationship" interchangeably. What it describes are relationships and relationship sets. A relationship set is a set of relationships of the same type. I just found a more thorough description of the Entity Relationship Model. http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter2/node1.html This one seems to use only the terms "relationship" and "relationship set". In fact, the next chapter of the course introduces relations and the relational algebra, which are clearly distinct concepts from ERM. The class notes come from the book _Database_System_Concepts_: http://db-book.com I'll ponder the rest of your email once we've agreed on common definitions. Shane
Shane Hathaway wrote:
Well, Evan, I'm having a hard time interpreting the paper you referenced that way. The paper seems to use "relation" and "relationship" interchangeably. What it describes are relationships and relationship sets. A relationship set is a set of relationships of the same type.
Yup. Since Relationship seems to be a natural class name for these thingies, I wanted to avoid calling individual things that they contain relationships.
This one seems to use only the terms "relationship" and "relationship set". In fact, the next chapter of the course introduces relations and the relational algebra, which are clearly distinct concepts from ERM.
Actually the next chapter starts off with "A row in a table represents a relationship among a set of values. Thus a table represents a collection of relationships." :-) Of course, this is consistent with what you state above. I'm happy to call them whatever, I just wanted to try to avoid making up terminology when some exists.
I'll ponder the rest of your email once we've agreed on common definitions.
It basically boils down to defining a relationship by listing the roles of entities that it relates. I also suggest declaring one-many type constraints among entitites in a relationship using a simple nested list notation. No involvement of the entities' classes is required, and in fact the relationship definition can use strings for role names and avoid caring what class/type/kind of entity will fill that role altogether. That might be problematic for APE, though, which I imagine would be happiest knowing exactly what class of object will fill each role. Cheers, Evan @ 4-am
Steve Alexander wrote:
What about making relationships among pre-existing objects that were not designed with relationships in mind?
Exactly! Without thinking I send this to Zope-dev: But here it goes again: If an object needs to be related to another object, there is no need for any of the objects to know they are related. If all the relations are external to the objects you can relate objects that are not programmed to be related. Keeping all the relation code loosely coupled from the objects. Making the objects hold the responsibility for the relations means that all objects has to take it into account. There is no reason for that. To take a simplified example from Plone. relations/ members_groups Members/ Groups/ In this case I can make the "members_groups" relationship manager take care of the relations, and Members and Groups would not need to know they can be related. I don't have to change the code in Members at all. Thus leaving Plone as it is, while I can add group functionality. maxm_groups = context.relations.member_groups.get(context.Members.maxm) The other way around I would need to change the Member product, adding group relations, and I can risk all hell breaking loose when Plone is updated the next time. I wonder if any of you guys reinventing relationships has even bothered to try out the mxmRelations product? You are taking the long and winding road to get to the same conclusions I did a year ago ... regards Max M
On Wed, Apr 30, 2003 at 09:40:46PM +0200, Max M wrote:
Steve Alexander wrote:
What about making relationships among pre-existing objects that were not designed with relationships in mind?
Exactly!
Without thinking I send this to Zope-dev: But here it goes again:
If an object needs to be related to another object, there is no need for any of the objects to know they are related. If all the relations are external to the objects you can relate objects that are not programmed to be related. Keeping all the relation code loosely coupled from the objects.
Making the objects hold the responsibility for the relations means that all objects has to take it into account. There is no reason for that.
well, *something* still has to worry about the objects moving. AFAICT in zope2 with mxmRelations, you either have to modify the objects so the notify the relations object when they move, or modify the containers to do so whenever something within them is moved. To me it looks like TAANSTAAFL. -- Paul Winkler home: http://www.slinkp.com "Muppet Labs, where the future is made - today!"
Paul Winkler wrote:
Making the objects hold the responsibility for the relations means that all objects has to take it into account. There is no reason for that.
well, *something* still has to worry about the objects moving. AFAICT in zope2 with mxmRelations, you either have to modify the objects so the notify the relations object when they move, or modify the containers to do so whenever something within them is moved. To me it looks like TAANSTAAFL.
That is right. Which is one of the reasons why an ObjectHub is a good idea. Currently there is no way around this in Zope 2 :-( regards Max M
Just to be sure I understand, I think you meant to use the word "relationship" instead of "relation" throughout this message. Max M wrote:
If an object needs to be related to another object, there is no need for any of the objects to know they are related. If all the relations are external to the objects you can relate objects that are not programmed to be related. Keeping all the relation code loosely coupled from the objects.
Making the objects hold the responsibility for the relations means that all objects has to take it into account. There is no reason for that. To take a simplified example from Plone.
relations/ members_groups
Members/ Groups/
In this case I can make the "members_groups" relationship manager take care of the relations, and Members and Groups would not need to know they can be related. I don't have to change the code in Members at all. Thus leaving Plone as it is, while I can add group functionality.
maxm_groups = context.relations.member_groups.get(context.Members.maxm)
The other way around I would need to change the Member product, adding group relations, and I can risk all hell breaking loose when Plone is updated the next time.
I wonder if any of you guys reinventing relationships has even bothered to try out the mxmRelations product? You are taking the long and winding road to get to the same conclusions I did a year ago ...
I don't know about the others participating in this thread, but I'm reaching quite different conclusions than you did. For example: - The solution should not require Zope. - It should not be necessary to visit the management interface to set up relationships. - Developers should have the *option* of using relationships directly in their classes, with a Pythonic API. I listed other requirements this morning, and as far as I can tell, few are fulfilled by mxmRelations. On the other hand, maybe I misunderstand mxmRelations. Does it fulfill the requirements I listed? I'm sure there's nothing wrong with mxmRelations, but it seems like we're targetting different audiences. Shane
Shane Hathaway wrote:
I don't know about the others participating in this thread, but I'm reaching quite different conclusions than you did. For example:
- The solution should not require Zope.
It is implemented as a Zope product, but the base class that implements the relationships is actually pure Python. The latest version is based on an OOBTree to handle the relations.
- It should not be necessary to visit the management interface to set up relationships.
No.
- Developers should have the *option* of using relationships directly in their classes, with a Pythonic API.
I think that should be working on top a general relationship service.
I listed other requirements this morning, and as far as I can tell, few are fulfilled by mxmRelations. On the other hand, maybe I misunderstand mxmRelations. Does it fulfill the requirements I listed? I'm sure there's nothing wrong with mxmRelations, but it seems like we're targetting different audiences.
Let me say that I am not married to mxmRelations, and I am not trying to push it as a solution. But I think it has a sound model for relationships. One that is very general and expandable. Here is a not even half baked pseudo example on how it could be implemented. Which uses the same idea as mxmRelations: class ManyToMany: # simillar to a mxmRelations class "A many to many Relationships, it can also hold meta data attrs" def __init__(self, name): self.name = name self.__relationship_graph = {} def relate(self, obj1, obj2): "relates 2 objects" def unrelate(self, obj1, obj2): "unrelates 2 objects" def del(self, obj): "Delete all relations to and from obj" def get(self, obj): "returns all relations to object" class RelationshipsManager: "Folder-like class to hold Relationships instances" def addRelationsships(self, name, relationsshipsInstance): "Adds a Relationships" def getRelationsships(self, name): "retuns a Relationships by name" def delRelationsships(self, name): "Delete a relationships" # perhaps a method on ZODB to return the RelationshipsManager ? rm = ZODB().getRelationshipsManager() rm.addRelationsships('members_groups', ManyToMany('members_groups')) r = rm.getRelationsships('members_groups') r.relate(someNewMember, someGroup) groups = r.get(someMember) >>> [someGroup]
Max M wrote:
- Developers should have the *option* of using relationships directly in their classes, with a Pythonic API.
I think that should be working on top a general relationship service.
I agree. It would be helpful, though, to identify a use case that describes why both you and I believe that relationships must be stored in a centralized location. I'd like this to justify a particular implementation I'm considering (see below.)
I listed other requirements this morning, and as far as I can tell, few are fulfilled by mxmRelations. On the other hand, maybe I misunderstand mxmRelations. Does it fulfill the requirements I listed? I'm sure there's nothing wrong with mxmRelations, but it seems like we're targetting different audiences.
Let me say that I am not married to mxmRelations, and I am not trying to push it as a solution. But I think it has a sound model for relationships. One that is very general and expandable.
Ok, I'll take a look.
# perhaps a method on ZODB to return the RelationshipsManager ? rm = ZODB().getRelationshipsManager() rm.addRelationsships('members_groups', ManyToMany('members_groups')) r = rm.getRelationsships('members_groups') r.relate(someNewMember, someGroup) groups = r.get(someMember) >>> [someGroup]
We shouldn't add a getRelationshipsManager() method to any ZODB classes. In pure-ZODB mode, a possible way to store relationships is the use of a special branch of the ZODB root object. In Zope, there are normally two branches of the root object, "Application" and "ZGlobals". Relationship management could use a private branch, perhaps called "Relationships". But the application should not have to know that at all; it should never access self._p_jar.root()['Relationships']. That should be abstracted away. Shane
It would be helpful, though, to identify a use case that describes why both you and I believe that relationships must be stored in a centralized location.
Perhaps so that we can answer questions like "list all the grandmothers". If we're storing mother<->child relationships, and the knowledge of the existence of a particular relationship resides only in the objects involved in the relationship, then we have to examine all of the objects that may be part of a relationship. If the relationships are stored centrally, then we can start our search with the relationships and work from there. -- Steve Alexander
Shane Hathaway wrote:
Max M wrote:
- Developers should have the *option* of using relationships directly in their classes, with a Pythonic API.
I think that should be working on top a general relationship service.
I agree. It would be helpful, though, to identify a use case that describes why both you and I believe that relationships must be stored in a centralized location. I'd like this to justify a particular implementation I'm considering (see below.)
To add some perspective from the "outside", i.e. a Zope 2 product developer, I really can't see why relationships should be stored this way. First, it feels so "un-objectoriented". If I want to know from an object to which other objects it relates to (I am implying here that relationships are directed), I want to ask the object, not some centralized location. Second, there's the point of scaling. Let's say I have an online shop with 100 categories, and 10000 items for sale. Say I want to place items in more than one category. I'd use relations (category -> item). But now the central "relationship" database will hold up to one million items, while storing the relationships in the categories will only go up to 10000. And that is IMO the more typical use case. While you can search the central location for relationships, the same could also be achieved by combining the local storage of relationships with maybe a special index for (Z)Catalog. I also don't see how one could get an (A,B) relationship implemented in Zope 2 without B being RelationshipAware (hah ;)), no matter where the relationships are stored. Depending on the key used for B, a lot of problems can occur. As it was talked about the problem with "moving" objects, I assume the path to an object could be the used as a key. This gets interesting when on deletes B and adds another B for instance. Another question, what happens if I export a subtree with objects to another zope? With the relationships are stored as attributes to the objects, I'm able to to at least keep some of the relationship information, maybe I just have to verify which relations are still valid and delete the ones which point to non-existing objects. What would I do in the centralized case? In short, as a developer for zope 2, I'd like to see relations implemented in their most basic form, that is as a possibility to do a 1:n mapping between one object and others. cheers, oliver
Oliver Bleutgen wrote:
Shane Hathaway wrote:
I agree. It would be helpful, though, to identify a use case that describes why both you and I believe that relationships must be stored in a centralized location. I'd like this to justify a particular implementation I'm considering (see below.)
To add some perspective from the "outside", i.e. a Zope 2 product developer, I really can't see why relationships should be stored this way. First, it feels so "un-objectoriented". If I want to know from an object to which other objects it relates to (I am implying here that relationships are directed), I want to ask the object, not some centralized location.
Thank you for your perspective. This gives me an opportunity to elaborate more. As Steve said, it is advantageous to store relationships in a central location because it enables you to infer new relationships without traversing the object system. But like you say, you need an object-oriented view of relationships. That is why descriptors (or in ZODB3, ComputedAttributes) are an essential requirement for the right relationship management solution. In a student registration system, to find out the current course registrations for a student, you'd like to simply use the expression "student.current_registrations". To achieve this, current_registrations should be a descriptor or a computed attribute. That way, it can transparently know where to find the relationship repository and which student to look for. Thus relationship storage can be centralized without cluttering the application or burdening developers.
Second, there's the point of scaling. Let's say I have an online shop with 100 categories, and 10000 items for sale. Say I want to place items in more than one category. I'd use relations (category -> item). But now the central "relationship" database will hold up to one million items, while storing the relationships in the categories will only go up to 10000. And that is IMO the more typical use case.
We haven't chosen a relationship storage model yet. That's the subject of my next email.
While you can search the central location for relationships, the same could also be achieved by combining the local storage of relationships with maybe a special index for (Z)Catalog.
I also don't see how one could get an (A,B) relationship implemented in Zope 2 without B being RelationshipAware (hah ;)), no matter where the relationships are stored. Depending on the key used for B, a lot of problems can occur. As it was talked about the problem with "moving" objects, I assume the path to an object could be the used as a key. This gets interesting when on deletes B and adds another B for instance.
This is why it's important to support (but not require) indirect references to objects. In a pure ZODB application, it's probably simpler for relationships to make direct references to related objects. In a Zope application (with context wrapping and a filesystem-like structure), it's better to make indirect references with the help of an object hub or similar.
Another question, what happens if I export a subtree with objects to another zope? With the relationships are stored as attributes to the objects, I'm able to to at least keep some of the relationship information, maybe I just have to verify which relations are still valid and delete the ones which point to non-existing objects. What would I do in the centralized case?
You would just export/import your relationships also. If you use indirect references, all the relationships will remain intact.
In short, as a developer for zope 2, I'd like to see relations implemented in their most basic form, that is as a possibility to do a 1:n mapping between one object and others.
Note that you already can. ZODB is mature enough that it allows multiple references to a single object, so you could just tie two objects together in a relationship. But there are numerous problems with doing that. The requirements / use cases I listed earlier describe some of the problems that need to be solved. Shane
* Max M <maxm@mxm.dk> [2003-04-30 21:42]:
Steve Alexander wrote:
What about making relationships among pre-existing objects that were not designed with relationships in mind?
Exactly!
I think Shane's API can cope with this - it will have to ;-)
Without thinking I send this to Zope-dev: But here it goes again:
If an object needs to be related to another object, there is no need for any of the objects to know they are related. If all the relations are external to the objects you can relate objects that are not programmed to be related. Keeping all the relation code loosely coupled from the objects.
Objects *do* need to know they are related or have some mechanism to infer a relationship if: - they want to access and manipulate related objects as attributes. - a persistence framework needs to discover relationships
Making the objects hold the responsibility for the relations means that all objects has to take it into account. There is no reason for that. To take a simplified example from Plone.
relations/ members_groups
Members/ Groups/
In this case I can make the "members_groups" relationship manager take care of the relations, and Members and Groups would not need to know they can be related. I don't have to change the code in Members at all. Thus leaving Plone as it is, while I can add group functionality.
maxm_groups = context.relations.member_groups.get(context.Members.maxm)
The other way around I would need to change the Member product, adding group relations, and I can risk all hell breaking loose when Plone is updated the next time.
I wonder if any of you guys reinventing relationships has even bothered to try out the mxmRelations product? You are taking the long and winding road to get to the same conclusions I did a year ago ...
As you know, I know mxmRelations well and am currently using it to store relationships between objects in Zope 2. But mxmRelations only solves part of the problem: storing and retrieving relationships between objects. Additionally one needs a simple way to teach an object how to compute relationships as attributes. This is not easy in Zope and writing a method and ComputedAttribute for each relationship per class is too much work. Given this, it excites me that there is so much talk to do relationships at ZODB level. For me there is a lot of overlap between the ideas about Relationships in this discussion, mxmRelations and ObjectHub ie. a repository type service for relationships. The significant differences are that any API coming forth should make relationships discoverable on a object and that it should not be Zope specific. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
* Shane Hathaway <shane@zope.com> [2003-04-30 16:55]:
Do you agree with these requirements and minimal use cases? Is there anything I need to clarify?
This sounds good. Now what is the next step? I'd like to help build on the API or aspects of it like making it work in Zope 2 for instance. I like the idea of having RelationShips as a separate branch in the ZODB that you posted earlier. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Roché Compaan wrote:
* Shane Hathaway <shane@zope.com> [2003-04-30 16:55]:
Do you agree with these requirements and minimal use cases? Is there anything I need to clarify?
This sounds good. Now what is the next step? I'd like to help build on the API or aspects of it like making it work in Zope 2 for instance. I like the idea of having RelationShips as a separate branch in the ZODB that you posted earlier.
Well, I think we should look at the way relationships get stored. Right now I can think of two possible models: - Store and index relationship objects. This is perhaps the most obvious way to do it. - Use relations (database tables) and infer relationships rather than store them. This could allow us to take advantage of relational theory. The advantage of storing relationship objects directly is that it's obvious. When you want to relate object A to object B, you just create a Relationship object that links A and B and store it. You can store extra metadata by creating your own relationship classes. Relationship objects need only implement a minimal interface. The disadvantage of storing relationship objects is that every relationship is stored as a separate object. But in a complex application, aren't there potentially far too many relationships to enumerate? How would we manage this? And would it be possible to infer relationships? The advantage of using relations is that it gives us the benefits of relational theory. Relational theory provides a clear way to store and query relationships efficiently. It lets you infer relationships based on explicit relationships. The disadvantage of using relations is that relationships have to be decomposed for storage. You can't just add an attribute to a Relationship object and expect it to be stored; you also have to arrange for the corresponding relation to store another column. (Although you might automate the process by telling a relation to keep its schema in sync with a particular class.) Either solution provides a way to store and retrieve simple relationships. The difference is in the way they can expand. I like to imagine that the relational model was developed for the purpose of scaling the entity relationship model, but that's a wild guess. I suppose mxmRelations stores relationship objects directly. A ZCatalog instance, on the other hand, is much like a relation, although it doesn't implement all the same operations and provides some extra operations. Shane
Shane Hathaway wrote:
Roché Compaan wrote:
* Shane Hathaway <shane@zope.com> [2003-04-30 16:55]:
Do you agree with these requirements and minimal use cases? Is there anything I need to clarify?
This sounds good. Now what is the next step? I'd like to help build on the API or aspects of it like making it work in Zope 2 for instance. I like the idea of having RelationShips as a separate branch in the ZODB that you posted earlier.
Well, I think we should look at the way relationships get stored. Right now I can think of two possible models:
- Store and index relationship objects. This is perhaps the most obvious way to do it.
- Use relations (database tables) and infer relationships rather than store them. This could allow us to take advantage of relational theory.
The advantage of storing relationship objects directly is that it's obvious. When you want to relate object A to object B, you just create a Relationship object that links A and B and store it. You can store extra metadata by creating your own relationship classes. Relationship objects need only implement a minimal interface.
The disadvantage of storing relationship objects is that every relationship is stored as a separate object. But in a complex application, aren't there potentially far too many relationships to enumerate? How would we manage this? And would it be possible to infer relationships?
The advantage of using relations is that it gives us the benefits of relational theory. Relational theory provides a clear way to store and query relationships efficiently. It lets you infer relationships based on explicit relationships.
The disadvantage of using relations is that relationships have to be decomposed for storage. You can't just add an attribute to a Relationship object and expect it to be stored; you also have to arrange for the corresponding relation to store another column. (Although you might automate the process by telling a relation to keep its schema in sync with a particular class.)
Either solution provides a way to store and retrieve simple relationships. The difference is in the way they can expand. I like to imagine that the relational model was developed for the purpose of scaling the entity relationship model, but that's a wild guess.
I suppose mxmRelations stores relationship objects directly. A ZCatalog instance, on the other hand, is much like a relation, although it doesn't implement all the same operations and provides some extra operations.
Shane
We (@zzict) implemented something that was "RelationAware" some time ago, and found it to be non trivial, and a pest for the ZODB. What happens is this: [first some Notations A,B,.... objects R(A,B) is an object associating A and B, basically having a reference to A and B. ] If A or B move (cut & paste fe) you should update R as well, but if you install backpointers, then you end up having a spaghetti of entangled objects so that if 1 is "pumped up" from storage, the whole mess gets pumped up... and cached. You can try to have a name based binding to decouple things, but it is far easier said than done,especially with cloning etc... Anyway, we ended up moving our whole object model to some RDB (also for scalability and other reasons). As a consequence, designed relations our now trivial, but ad hoc relations impossible, which is ok for us, but might not be for you. Hope this might help your decisions. Romain.
* Shane Hathaway <shane@zope.com> [2003-05-01 19:36]:
Roché Compaan wrote:
* Shane Hathaway <shane@zope.com> [2003-04-30 16:55]:
Do you agree with these requirements and minimal use cases? Is there anything I need to clarify?
This sounds good. Now what is the next step? I'd like to help build on the API or aspects of it like making it work in Zope 2 for instance. I like the idea of having RelationShips as a separate branch in the ZODB that you posted earlier.
Well, I think we should look at the way relationships get stored. Right now I can think of two possible models:
- Store and index relationship objects. This is perhaps the most obvious way to do it.
- Use relations (database tables) and infer relationships rather than store them. This could allow us to take advantage of relational theory.
The advantage of storing relationship objects directly is that it's obvious. When you want to relate object A to object B, you just create a Relationship object that links A and B and store it. You can store extra metadata by creating your own relationship classes. Relationship objects need only implement a minimal interface.
The disadvantage of storing relationship objects is that every relationship is stored as a separate object. But in a complex application, aren't there potentially far too many relationships to enumerate? How would we manage this? And would it be possible to infer relationships?
When you use database tables you have the same problem, but it's not a problem IMHO. It is common practice to have tables that only store relationships - lots of them. In my db design for student registrations I would have the following tables: Student StudentCourses Course ______________ -------------------- --------------- ID | Name StudentID | CourseID ID | Name Term TermCourses -------------- -------------------- ID | Name TermID | CourseID IMO this is the same as having 3 relation objects in which object relationships are stored. I don't think this is difficult to manage either - mxmRelations does this pretty well. For the above I will have 3 BTrees which should be able to store many objects and scale well. There is also no need to index them - you can do lookups directly on the relation.
The advantage of using relations is that it gives us the benefits of relational theory. Relational theory provides a clear way to store and query relationships efficiently. It lets you infer relationships based on explicit relationships.
I *think* we are already applying some relational theory by: - Creating a relation object and storing relationships in it (vs a normalised table representing the relation that stores relationships as records) - We use unique ids/paths to relate objects and do not duplicate information about the objects in the relationship. - We normalise classes just like we normalise tables. - We can still infer relationships (just like sql views) So we don't have to give up the benefits ;-) To illustrate I'll compare a sql join on the db above with the python way to infer the relation of registrations for the current term. SQL: select Student.Name, Course.Name from Student, Course, Term, StudentCourses, TermCourses where Student.ID = StudentCourses.StudentID and Course.ID = StudentCourses.CourseID and Course.ID = TermCourses.CourseID and Term.ID = TermCourses.TermID and Term.ID = CurrentTermID; Python: courses = term_courses.get(current_term); students = student_courses.get(courses); # by now we already have the objects we want # but lets display the result too. for student in students: print student.Name for course in student_courses.get(student): print course.Name
The disadvantage of using relations is that relationships have to be decomposed for storage. You can't just add an attribute to a Relationship object and expect it to be stored; you also have to arrange for the corresponding relation to store another column. (Although you might automate the process by telling a relation to keep its schema in sync with a particular class.)
Either solution provides a way to store and retrieve simple relationships. The difference is in the way they can expand. I like to imagine that the relational model was developed for the purpose of scaling the entity relationship model, but that's a wild guess.
I suppose mxmRelations stores relationship objects directly. A ZCatalog instance, on the other hand, is much like a relation, although it doesn't implement all the same operations and provides some extra operations.
To me a ZCatalog is much more like a index on a database table than a relation in that it does not know both ends of a relationship. I am not too worried about how we will store relationships. I will go for the most obvious way: store relationship objects. Here the mxmRelations API is a good starting point and one can expand the API to store metadata as well. Am I correct in saying that Ape wouldn't need modification either since a relationship object will be handled just like any other object? Ok, so far we can "list all grandmothers" ;-) The difficult part in my mind is to give objects insight into their relationships. I think this issue is separate from *storing* relationships. The implementation might be tricky, but the requirements are simple: - One should be able to ask an object what relationships it has. A Person object might answer: "With my employer, my wife and friends". class Person: Employer = VoodooRelationshipThingy() MyWife = VoodooRelationshipThingy() Friends = VoodooRelationshipThingy() or in the case where Person is an object in another Product that I don't want mess with (hopefully subclassing is not the only way to do this): class MyPerson(Person): Employer = VoodooRelationshipThingy() MyWife = VoodooRelationshipThingy() Friends = VoodooRelationshipThingy() - Relationships should be accessible as attributes. This makes it possible to ask the Person object what the name of its employer is, tell it that it has a new employer, and a new friend. pete = Person() assert pete.Employer.Name = "SomeCompany" # Pete has a new employer pete.Employer = TheOtherCompany assert pete.Employer.Name = "TheOtherCompany" assert pete.Friends.John.Surname = "Smith" # Pete has a new friend pete.Friends.add(mary) Something like ComputedAttribute or descriptors should make it possible. Hmm, I might just have thought of a way to do this with ComputedAttribute which I'll try tomorrow. But ComputedAttribute is Zope2 specific isn't it? Darn ... It's quite late now so I hope I made sense. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
On Thu, 1 May 2003 23:30:04 +0200 Roché Compaan <roche@upfrontsystems.co.za> wrote:
Something like ComputedAttribute or descriptors should make it possible. Hmm, I might just have thought of a way to do this with ComputedAttribute which I'll try tomorrow. But ComputedAttribute is Zope2 specific isn't it? Darn ...
It looks like ComputedAttribute has no dependencies on Zope 2 code so I had a go at another API addressing specifically the way in which objects use relationships. I used the Relations class in mxmRelations as is since it has no Zope dependencies. How this will be made accessible as a different branch in the ZODB (iow where it will be stored and how you will locate it) and how the Relations API must be extended still needs to be addressed. I assume all objects have ids. This is not a requirement and if we drop this assumption very little in the implementation have to change. I do not define Relationships as static attributes since we need a handle on the class instance to relate. from Relations import Relations # This is the Relations class in mxmRelations import ExtensionClass from ComputedAttribute import ComputedAttribute class Relationship(ExtensionClass.Base): def __init__(self, ob, relation, cardinality): self.ob = ob self.relation = relation self.cardinality = cardinality _r_ = ComputedAttribute(lambda self: self.relation._get(self.ob)) def __getattr__(self, name): l = self._r_ if self.cardinality == 'single': return getattr(l[0], name) else: # This can optimised: Relations can return a BTree that we # can subscript for ob in l: if ob.id == name: return ob def add(self, other): # TODO: test if 'other' is not a sequence if our cardinality is # single self.relation._relate(self.ob, other) def remove(self, other): self.relation._unrelate(self.ob, other) student_courses = Relations() term_courses = Relations() # this relation will be between an object that knows it relationships and # instances of third party objects that doesn't school_courses = Relations() class Student: def __init__(self, id, name): self.id = id self.name = name self.courses = Relationship(self, student_courses, 'multiple') class Course: def __init__(self, id, name): self.id = id self.name = name self.students = Relationship(self, student_courses, 'multiple') self.school = Relationship(self, school_courses, 'single') class Term: def __init__(self, id, name): self.id = id self.name = name self.courses = Relationship(self, term_courses, 'multiple') # This is a third party class class School: def __init__(self, id, name): self.id = id self.name = name john = Student('john', 'John Smith') peter = Student('peter', 'Peter Pan') mary = Student('mary', 'Mary Scary') susan = Student('susan', 'Susan') python101 = Course('python101', 'Python 101') zope101 = Course('zope101', 'Zope 101') law = School('law', 'Law') compsci = School('compsci', 'Computer Science') python101.students.add(john) python101.students.add(peter) python101.school.add(compsci) zope101.students.add(mary) zope101.students.add(peter) zope101.school.add(compsci) assert john.courses.python101.name == "Python 101" assert mary.courses.zope101.name == "Zope 101" assert python101.school.name == "Computer Science" # Teach compsci about relationships compsci.courses = Relationship(compsci, school_courses, 'multiple') assert compsci.courses.python101.name == "Python 101" # Changing attributes on related objects python101.school.name = "Computer Science School" zope101.students.mary.name = "Mary Airy" -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Roché Compaan wrote:
On Thu, 1 May 2003 23:30:04 +0200 Roché Compaan <roche@upfrontsystems.co.za> wrote:
Something like ComputedAttribute or descriptors should make it possible. Hmm, I might just have thought of a way to do this with ComputedAttribute which I'll try tomorrow. But ComputedAttribute is Zope2 specific isn't it? Darn ...
It looks like ComputedAttribute has no dependencies on Zope 2 code so I had a go at another API addressing specifically the way in which objects use relationships.
The work you did looks good, but at this point I'd still like to call the work you did a proof of concept. The next step is to write a proposal that others can comment on. A long discussion like this is hard for others to follow, but a proposal sums up everything we learned.
I used the Relations class in mxmRelations as is since it has no Zope dependencies. How this will be made accessible as a different branch in the ZODB (iow where it will be stored and how you will locate it) and how the Relations API must be extended still needs to be addressed.
One request: the word "relation" should not appear anywhere in the API. We should use "relationship" consistently. I'm pretty sure that relations are only one possible implementation of relationship storage.
I assume all objects have ids. This is not a requirement and if we drop this assumption very little in the implementation have to change.
We can't make the assumption that all objects have IDs. One of the requirements is that either direct or indirect references are possible. All of the requirements should be listed on the proposal.
I do not define Relationships as static attributes since we need a handle on the class instance to relate.
That's what computed attributes and descriptors are for.
from Relations import Relations # This is the Relations class in mxmRelations import ExtensionClass from ComputedAttribute import ComputedAttribute
class Relationship(ExtensionClass.Base):
def __init__(self, ob, relation, cardinality): self.ob = ob self.relation = relation self.cardinality = cardinality
_r_ = ComputedAttribute(lambda self: self.relation._get(self.ob))
Actually, I meant for this class to be a ComputedAttribute/descriptor. This class does not need to use computed attributes.
def __getattr__(self, name): l = self._r_ if self.cardinality == 'single': return getattr(l[0], name) else: # This can optimised: Relations can return a BTree that we # can subscript for ob in l: if ob.id == name: return ob
def add(self, other): # TODO: test if 'other' is not a sequence if our cardinality is # single self.relation._relate(self.ob, other)
def remove(self, other): self.relation._unrelate(self.ob, other)
student_courses = Relations() term_courses = Relations() # this relation will be between an object that knows it relationships and # instances of third party objects that doesn't school_courses = Relations()
Note that you did not arrange for the relations to be stored in ZODB at all. I'll help you deal with that once we've prepared an API. If you have some time, I'd appreciate it if you started a proposal on a wiki page. Then we'll come up with an API. Once we're satisfied with the API, we'll ask for comments. Shane
On Fri, 02 May 2003 09:47:37 -0400 Shane Hathaway <shane@zope.com> wrote:
Roché Compaan wrote:
On Thu, 1 May 2003 23:30:04 +0200 Roché Compaan <roche@upfrontsystems.co.za> wrote:
Something like ComputedAttribute or descriptors should make it possible. Hmm, I might just have thought of a way to do this with ComputedAttribute which I'll try tomorrow. But ComputedAttribute is Zope2 specific isn't it? Darn ...
It looks like ComputedAttribute has no dependencies on Zope 2 code so I had a go at another API addressing specifically the way in which objects use relationships.
The work you did looks good, but at this point I'd still like to call the work you did a proof of concept.
Absolutely proof of concept. I wrote it with far to few hours of sleep mainly because I lay awake thinking about it ;-)
The next step is to write a proposal that others can comment on. A long discussion like this is hard for others to follow, but a proposal sums up everything we learned.
I used the Relations class in mxmRelations as is since it has no Zope dependencies. How this will be made accessible as a different branch in the ZODB (iow where it will be stored and how you will locate it) and how the Relations API must be extended still needs to be addressed.
One request: the word "relation" should not appear anywhere in the API. We should use "relationship" consistently. I'm pretty sure that relations are only one possible implementation of relationship storage.
If you are that sure then the distinction is not that clear to me. I understood "relation" as the *definition* of a logical association between two objects and relationship as the association itself. Marriage is the relation, Peter's marriage to Susan is the relationship. Sure, there is some overlap with "relation" in relational theory but we describe the same thing: a logical association. When you use the word "relation" you are referring to it as a specific storage implementation: a table. But to stop the confusion we can talk about Relationship and RelationshipStorage or just use the collective Relationships for the storage location.
I assume all objects have ids. This is not a requirement and if we drop this assumption very little in the implementation have to change.
We can't make the assumption that all objects have IDs. One of the requirements is that either direct or indirect references are possible. All of the requirements should be listed on the proposal.
Just to be clear can you give me an example of a direct and indirect references.
I do not define Relationships as static attributes since we need a handle on the class instance to relate.
That's what computed attributes and descriptors are for.
from Relations import Relations # This is the Relations class in mxmRelations import ExtensionClass from ComputedAttribute import ComputedAttribute
class Relationship(ExtensionClass.Base):
def __init__(self, ob, relation, cardinality): self.ob = ob self.relation = relation self.cardinality = cardinality
_r_ = ComputedAttribute(lambda self: self.relation._get(self.ob))
Actually, I meant for this class to be a ComputedAttribute/descriptor. This class does not need to use computed attributes.
Iow, Relationship should subclass ComputedAttribute? I tried something like that but ran into some problems and just wanted to get something out there for comment.
def __getattr__(self, name): l = self._r_ if self.cardinality == 'single': return getattr(l[0], name) else: # This can optimised: Relations can return a BTree that we # can subscript for ob in l: if ob.id == name: return ob
def add(self, other): # TODO: test if 'other' is not a sequence if our cardinality is # single self.relation._relate(self.ob, other)
def remove(self, other): self.relation._unrelate(self.ob, other)
student_courses = Relations() term_courses = Relations() # this relation will be between an object that knows it relationships and # instances of third party objects that doesn't school_courses = Relations()
Note that you did not arrange for the relations to be stored in ZODB at all. I'll help you deal with that once we've prepared an API.
If you have some time, I'd appreciate it if you started a proposal on a wiki page. Then we'll come up with an API. Once we're satisfied with the API, we'll ask for comments.
I'll ask Jean to help me with this - he's formulates much better than I do and has a much better grip on the subtleties of language. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Roché Compaan wrote:
On Fri, 02 May 2003 09:47:37 -0400 Shane Hathaway <shane@zope.com> wrote:
One request: the word "relation" should not appear anywhere in the API. We should use "relationship" consistently. I'm pretty sure that relations are only one possible implementation of relationship storage.
If you are that sure then the distinction is not that clear to me. I understood "relation" as the *definition* of a logical association between two objects and relationship as the association itself. Marriage is the relation, Peter's marriage to Susan is the relationship. Sure, there is some overlap with "relation" in relational theory but we describe the same thing: a logical association. When you use the word "relation" you are referring to it as a specific storage implementation: a table.
Well, perhaps I am incorrect in my thinking, but every discussion I can Google (heh, I finally used Google as a verb) that mentions both the entity-relationship model and the relational model explains it differently. First they introduce ERM then explain how to implement it using relations in a relational database. ERM seems to be described as the underlying model for relations. There are a few places that mix the use of relation and relationship. In some contexts, the two words are synonyms, and it seems like some authors use both words loosely. I'd be happy to be proven wrong (so I can agree with you.) Is there a model that defines "relationship" as the concrete form of "relation", as you explained it? It seems to be the other way around. :-) In my search, I found a very nice introduction to ERM: http://www.cs.nyu.edu/cs/dept_info/course_home_pages/spr97/V22.0444/unit02/p...
I assume all objects have ids. This is not a requirement and if we drop this assumption very little in the implementation have to change.
We can't make the assumption that all objects have IDs. One of the requirements is that either direct or indirect references are possible. All of the requirements should be listed on the proposal.
Just to be clear can you give me an example of a direct and indirect references.
Direct references are what ZODB does by default. Containment is the simplest kind of direct reference. Indirect references, on the other hand, use paths or hub IDs to find objects.
Actually, I meant for this class to be a ComputedAttribute/descriptor. This class does not need to use computed attributes.
Iow, Relationship should subclass ComputedAttribute? I tried something like that but ran into some problems and just wanted to get something out there for comment.
Ok.
If you have some time, I'd appreciate it if you started a proposal on a wiki page. Then we'll come up with an API. Once we're satisfied with the API, we'll ask for comments.
I'll ask Jean to help me with this - he's formulates much better than I do and has a much better grip on the subtleties of language.
I wouldn't worry about making it formal, just complete (yet concise.) Shane
Shane Hathaway wrote:
Roché Compaan wrote:
If you are that sure then the distinction is not that clear to me. I understood "relation" as the *definition* of a logical association between two objects and relationship as the association itself. Marriage is the relation, Peter's marriage to Susan is the relationship. Sure, there is some overlap with "relation" in relational theory but we describe the same thing: a logical association. When you use the word "relation" you are referring to it as a specific storage implementation: a table.
Well, perhaps I am incorrect in my thinking, but every discussion I can Google (heh, I finally used Google as a verb) that mentions both the entity-relationship model and the relational model explains it differently. First they introduce ERM then explain how to implement it using relations in a relational database. ERM seems to be described as the underlying model for relations.
FWIW... Wikipedia helped. http://www.wikipedia.org/wiki/ER_diagram http://www.wikipedia.org/wiki/Mathematical_relation http://www.wikipedia.org/wiki/Relational_model These pages don't mix up the two terms. From what I understand now, relation is simply the mathematical term for relationship. Most people think in terms of relationships since it's a useful, fuzzy term. For greater precision, mathematicians perform manipulations using relations. Also, I was incorrect in mixing "relation" and "table". The relational data model seems to define relation a little differently than the mathematical term, so it's better to just use the word "table" when that's what I mean. So here's what I was really trying to ask earlier: should we base the API on the entity-relationship model or on the relational model? The entity-relationship model is quite simple, allowing you to translate a drawing directly into code, but it limits the kinds of queries you can make. The relational model lets you formulate much more complex queries, but at a higher buy-in cost. I'm leaning toward the entity-relationship model, since one of the primary goals is simplicity. Besides, assuming we don't have to change ZODB in order to get either feature, we could have both someday if we needed them. By the way, I just learned that Ted Codd, the father of relational database technology, passed away two weeks ago. May he rest in peace. http://www.intelligententerprise.com/online_only/features/030425.shtml Shane
* Shane Hathaway <shane@zope.com> [2003-05-02 20:48]:
http://www.wikipedia.org/wiki/ER_diagram http://www.wikipedia.org/wiki/Mathematical_relation http://www.wikipedia.org/wiki/Relational_model
These pages don't mix up the two terms. From what I understand now, relation is simply the mathematical term for relationship. Most people think in terms of relationships since it's a useful, fuzzy term. For greater precision, mathematicians perform manipulations using relations.
Also, I was incorrect in mixing "relation" and "table". The relational data model seems to define relation a little differently than the mathematical term, so it's better to just use the word "table" when that's what I mean.
So here's what I was really trying to ask earlier: should we base the API on the entity-relationship model or on the relational model? The entity-relationship model is quite simple, allowing you to translate a drawing directly into code, but it limits the kinds of queries you can make. The relational model lets you formulate much more complex queries, but at a higher buy-in cost.
I am not quite sure how either model can be the basis for the API? Both models are conceptual tools for visualising data, relationships, etc. Interestingly enough in one of the links you posted (the notes of Osmar Zaiane) another data model is the object oriented data model. Isn't this the most applicable model especially since we are working with the ZODB? Both the entity-relationship model and the relational model has one serious constraint. Entities in the e-r model and relations in the relational model have a fixed schemas or fixed set of attributes. Although this is true most of the time when dealing with objects one can easily imagine cases where it is not like adding a property to an instance of a folder. Reading through the notes mentioned above two other terms struck a cord especially since they are mentioned in the context of object databases. In the notes on Chapter 8, "Object oriented databases" what we call a relationship seems to be called a "reference" or persistent pointer. http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter8/node16.html The other term is "association". Both terms are common and used in many contexts and wont necessarily dry up the confusion but I thought I'll throw them in the hat. I am inclined to "reference" since in an object oriented world it makes much more sense to say attribute x of object y is a reference to object z than attribute x of object y is a relationship to object z. Coming back to the problem statement, shouldn't it be as simple as "The ZODB should be able to persist references?" or did I just take the wrong turn? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
On Sat, 3 May 2003, [iso-8859-1] Roché Compaan wrote:
I am not quite sure how either model can be the basis for the API? Both models are conceptual tools for visualising data, relationships, etc.
SQL is based on the relational model. The API of most of the "relationship" Zope products I've heard about are based on the entity-relationship model. I hope that gives you a better idea of what I mean. It seems pretty clear we're going with something like the E-R model, we just have to acknowledge that it will create limitations later on.
Interestingly enough in one of the links you posted (the notes of Osmar Zaiane) another data model is the object oriented data model. Isn't this the most applicable model especially since we are working with the ZODB?
ZODB is built on the OO data model. What we're talking about is allowing applications to use another model at the same time, because the OO data model (or perhaps the way ZODB interprets it) has limitations.
Both the entity-relationship model and the relational model has one serious constraint. Entities in the e-r model and relations in the relational model have a fixed schemas or fixed set of attributes. Although this is true most of the time when dealing with objects one can easily imagine cases where it is not like adding a property to an instance of a folder.
The E-R model can support non-fixed schemas. It's much looser.
Reading through the notes mentioned above two other terms struck a cord especially since they are mentioned in the context of object databases. In the notes on Chapter 8, "Object oriented databases" what we call a relationship seems to be called a "reference" or persistent pointer.
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter8/node16.html
The other term is "association". Both terms are common and used in many contexts and wont necessarily dry up the confusion but I thought I'll throw them in the hat. I am inclined to "reference" since in an object oriented world it makes much more sense to say attribute x of object y is a reference to object z than attribute x of object y is a relationship to object z.
ZODB already uses persistent references. That's what OIDs are for. But ZODB references are not bidirectional.
Coming back to the problem statement, shouldn't it be as simple as "The ZODB should be able to persist references?" or did I just take the wrong turn?
I'm afraid you did. :-) The inverse relationship mentioned in the page you linked is talking about adding an integrity constraint, not about creating bidirectional references. Shane
On Sat, 3 May 2003 12:00:15 -0400 (EDT) Shane Hathaway <shane@zope.com> wrote:
ZODB already uses persistent references. That's what OIDs are for. But ZODB references are not bidirectional.
Coming back to the problem statement, shouldn't it be as simple as "The ZODB should be able to persist references?" or did I just take the wrong turn?
I'm afraid you did. :-) The inverse relationship mentioned in the page you linked is talking about adding an integrity constraint, not about creating bidirectional references.
I am glad I did - you've explained why so well :-) And you made relationships in the ZODB sound revolutionary which I think it is, considering that the OO data model doesn't address it. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
Shane Hathaway wrote:
ZODB is built on the OO data model. What we're talking about is allowing applications to use another model at the same time, because the OO data model (or perhaps the way ZODB interprets it)
Is this the bit that effectively says "every object must have one, and only one, canonical traversal path to access it? cheers, Chris
Chris Withers wrote:
Shane Hathaway wrote:
ZODB is built on the OO data model. What we're talking about is allowing applications to use another model at the same time, because the OO data model (or perhaps the way ZODB interprets it)
Is this the bit that effectively says "every object must have one, and only one, canonical traversal path to access it?
No, that's the filesystem model as interpreted by Zope. The OO model simply says that the objects are the database. You might say the OO model is a low-level model that doesn't intrinsically provide high-level operations like two-way relationships. Shane
Shane Hathaway wrote:
Chris Withers wrote:
Shane Hathaway wrote:
ZODB is built on the OO data model. What we're talking about is allowing applications to use another model at the same time, because the OO data model (or perhaps the way ZODB interprets it)
No,
Ok, so what did you mean by the way ZODB interprets is? :-) cheers, Chris
Chris Withers wrote:
Shane Hathaway wrote:
Chris Withers wrote:
Shane Hathaway wrote:
ZODB is built on the OO data model. What we're talking about is allowing applications to use another model at the same time, because the OO data model (or perhaps the way ZODB interprets it)
Ok, so what did you mean by the way ZODB interprets it? :-)
Just that other OO databases or O-R mechanisms might have a solution for relationships that involves only the OO model. I haven't found anything specific, though. Shane
Shane Hathaway wrote:
ZODB already uses persistent references. That's what OIDs are for. But ZODB references are not bidirectional.
I was under the impression that the Objecthub is Zope specific. Is that so? Or is it a plain ZODB thing? regards Max M
Max M wrote:
Shane Hathaway wrote:
ZODB already uses persistent references. That's what OIDs are for. But ZODB references are not bidirectional.
I was under the impression that the Objecthub is Zope specific. Is that so? Or is it a plain ZODB thing?
It's Zope specific. But in plain ZODB, you can achieve pretty much the same thing ObjectHub achieves using simple OIDs. So the relationship code should allow, but not require, direct references using OIDs. That might make you wonder why we need an object hub at all. The difference between Zope and an average ZODB application is that Zope lets you create a large multiuser system with many security contexts. An object hub assists the process by restoring references in context (i.e. with the correct context wrappers.) A simple ZODB application generally has only one security context, so it doesn't need the extra complexity. Shane
Shane Hathaway wrote:
I was under the impression that the Objecthub is Zope specific. Is that so? Or is it a plain ZODB thing?
It's Zope specific. But in plain ZODB, you can achieve pretty much the same thing ObjectHub achieves using simple OIDs. So the relationship code should allow, but not require, direct references using OIDs.
That might make you wonder why we need an object hub at all. The difference between Zope and an average ZODB application is that Zope lets you create a large multiuser system with many security contexts. An object hub assists the process by restoring references in context (i.e. with the correct context wrappers.) A simple ZODB application generally has only one security context, so it doesn't need the extra complexity.
Then it starts to get amusing. One of the primary reasons for the objecthub was to enable relations. So if the relations get implemented in in ZODB but need some functionality, will it not end up as a duplication of efforts? regards max M
* Shane Hathaway <shane@zope.com> [2003-05-02 18:47]:
Well, perhaps I am incorrect in my thinking, but every discussion I can Google (heh, I finally used Google as a verb) that mentions both the entity-relationship model and the relational model explains it differently. First they introduce ERM then explain how to implement it using relations in a relational database. ERM seems to be described as the underlying model for relations.
There are a few places that mix the use of relation and relationship. In some contexts, the two words are synonyms, and it seems like some authors use both words loosely. I'd be happy to be proven wrong (so I can agree with you.) Is there a model that defines "relationship" as the concrete form of "relation", as you explained it? It seems to be the other way around. :-)
I don't think there is a model that defines relationship as I've explained it, I was merely thinking of the definition of the words "relation" and "relationship". I am happy to stick with "relationship" if "relation" is confused with its use in existing theory.
Iow, Relationship should subclass ComputedAttribute? I tried something like that but ran into some problems and just wanted to get something out there for comment.
Ok.
As a matter of interest how will get a handle on the instance if you subclass ComputedAttribute and don't initialise it with a function of the class where the Relationship is created? -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za
participants (13)
-
Chris Withers -
Evan Simpson -
Gary Poster -
Jeremy Hylton -
Leonardo Rochael Almeida -
Max M -
Oliver Bleutgen -
Paul Winkler -
roche@upfrontsystems.co.za -
Roché Compaan -
Romain Slootmaekers -
Shane Hathaway -
Steve Alexander