[Zope-dev] RFC: RelationAware class for relations betweenobjects

Roché Compaan roche@upfrontsystems.co.za
Thu, 1 May 2003 23:30:04 +0200


* Shane Hathaway <shane@zope.com> [2003-05-01 19:36]:
> Roché Compaan wrote:
> >* Shane Hathaway <shane@zope.com> [2003-04-30 16:55]:
> >
> >>Do you agree with these requirements and minimal use cases?  Is there 
> >>anything I need to clarify?
> >
> >
> >This sounds good. Now what is the next step? I'd like to help build on
> >the API or aspects of it like making it work in Zope 2 for instance. I
> >like the idea of having RelationShips as a separate branch in the ZODB
> >that you posted earlier.
> 
> Well, I think we should look at the way relationships get stored.  Right 
> now I can think of two possible models:
> 
> - Store and index relationship objects.  This is perhaps the most 
> obvious way to do it.
> 
> - Use relations (database tables) and infer relationships rather than 
> store them.  This could allow us to take advantage of relational theory.
> 
> The advantage of storing relationship objects directly is that it's 
> obvious.  When you want to relate object A to object B, you just create 
> a Relationship object that links A and B and store it.  You can store 
> extra metadata by creating your own relationship classes.  Relationship 
> objects need only implement a minimal interface.
> 
> The disadvantage of storing relationship objects is that every 
> relationship is stored as a separate object.  But in a complex 
> application, aren't there potentially far too many relationships to 
> enumerate?  How would we manage this?  And would it be possible to infer 
> relationships?

When you use database tables you have the same problem, but it's not a
problem IMHO. It is common practice to have tables that only store
relationships - lots of them. In my db design for student registrations
I would have the following tables:

    Student             StudentCourses           Course
    ______________      --------------------     ---------------
    ID  | Name          StudentID | CourseID     ID  | Name

    Term                TermCourses
    --------------      --------------------
    ID  | Name          TermID    | CourseID


IMO this is the same as having 3 relation objects in which object
relationships are stored. I don't think this is difficult to manage
either - mxmRelations does this pretty well. For the above I will have 
3 BTrees which should be able to store many objects and scale well.
There is also no need to index them - you can do lookups directly on the
relation.

> The advantage of using relations is that it gives us the benefits of 
> relational theory.  Relational theory provides a clear way to store and 
> query relationships efficiently.  It lets you infer relationships based 
> on explicit relationships.

I *think* we are already applying some relational theory by:

    - Creating a relation object and storing relationships in it (vs a
      normalised table representing the relation that stores
      relationships as records)

    - We use unique ids/paths to relate objects and do not duplicate
      information about the objects in the relationship.

    - We normalise classes just like we normalise tables.

    - We can still infer relationships (just like sql views)

So we don't have to give up the benefits ;-) To illustrate I'll compare
a sql join on the db above with the python way to infer the relation of
registrations for the current term.

SQL:
    select
        Student.Name, Course.Name
    from
        Student, Course, Term, 
        StudentCourses, TermCourses
    where
        Student.ID = StudentCourses.StudentID and
        Course.ID  = StudentCourses.CourseID and
        Course.ID  = TermCourses.CourseID and
        Term.ID    = TermCourses.TermID and
        Term.ID    = CurrentTermID;
        
Python:
    courses  = term_courses.get(current_term);
    students = student_courses.get(courses);
    # by now we already have the objects we want
    # but lets display the result too.
    for student in students:
        print student.Name
        for course in student_courses.get(student):
            print course.Name

> The disadvantage of using relations is that relationships have to be 
> decomposed for storage.  You can't just add an attribute to a 
> Relationship object and expect it to be stored; you also have to arrange 
> for the corresponding relation to store another column.  (Although you 
> might automate the process by telling a relation to keep its schema in 
> sync with a particular class.)
> 
> Either solution provides a way to store and retrieve simple 
> relationships.  The difference is in the way they can expand.  I like to 
> imagine that the relational model was developed for the purpose of 
> scaling the entity relationship model, but that's a wild guess.
> 
> I suppose mxmRelations stores relationship objects directly.  A ZCatalog 
> instance, on the other hand, is much like a relation, although it 
> doesn't implement all the same operations and provides some extra 
> operations.

To me a ZCatalog is much more like a index on a database table than a
relation in that it does not know both ends of a relationship.

I am not too worried about how we will store relationships. I will go
for the most obvious way: store relationship objects. Here the
mxmRelations API is a good starting point and one can expand the API to
store metadata as well. Am I correct in saying that Ape wouldn't need
modification either since a relationship object will be handled just
like any other object?

Ok, so far we can "list all grandmothers" ;-)

The difficult part in my mind is to give objects insight into their
relationships. I think this issue is separate from *storing*
relationships. The implementation might be tricky, but the requirements
are simple: 
    
    - One should be able to ask an object what relationships it has. A
      Person object might answer: "With my employer, my wife and
      friends".

        class Person:
            Employer = VoodooRelationshipThingy()
            MyWife   = VoodooRelationshipThingy()
            Friends  = VoodooRelationshipThingy()

        or in the case where Person is an object in another Product that
        I don't want mess with (hopefully subclassing is not the only
        way to do this):

        class MyPerson(Person):
            Employer = VoodooRelationshipThingy()
            MyWife = VoodooRelationshipThingy()
            Friends  = VoodooRelationshipThingy()

    - Relationships should be accessible as attributes. This makes it
      possible to ask the Person object what the name of its employer
      is, tell it that it has a new employer, and a new friend.

        pete = Person()
        assert pete.Employer.Name = "SomeCompany"

        # Pete has a new employer
        pete.Employer = TheOtherCompany
        assert pete.Employer.Name = "TheOtherCompany"

        assert pete.Friends.John.Surname = "Smith"
        # Pete has a new friend
        pete.Friends.add(mary)

Something like ComputedAttribute or descriptors should make it possible.
Hmm, I might just have thought of a way to do this with
ComputedAttribute which I'll try tomorrow. But ComputedAttribute is
Zope2 specific isn't it? Darn ...

It's quite late now so I hope I made sense.

-- 
Roché Compaan
Upfront Systems                 http://www.upfrontsystems.co.za