After studying Jeffrey Shell's ZLDAP package, and the current ZODB system, in the light of recent conversations with Jim Fulton, a few lightbulbs went on with respect to the usefulness of multi-database Zopelications. For example, wouldn't it be keen if regular Zope objects could 'store' object attributes that were actually LDAP entries? Or SQL database records? That would be pretty awesome. The cool thing about the ZLDAP stuff is that the LDAP Connection object, itself a database, is actually a persistent object stored in the regular ZODB. That suggests a clean and sensible way to integrate multi-database Zopes: any given Zope installation must store connections to other databases as persistent objects within its "root" database. That is, any Persistent object in a particular Zopelication should have a _p_jar attribute which either points to the REQUEST-owned Connection, or to an jar which meets this criterion (recursively). This means that one could, at least in theory, reach any database in a multidatabase system by following an ever-expanding tree of database references. Now, if this property is followed, then it is possible for an object in any database to refer to any object which is located "downstream" in the tree. That is, an object O1 in database DB1 can reference object O2 in database DB2 so long as DB2 is reached by way of a persistent object stored in DB1, or a database thus referenced by DB1, recursively. (Upstream references are not possible without a global database naming system; however, there is nothing about my suggested implementation that prevents a global naming scheme from later being used either together with, or in place of, this model.) Due to this tree-oriented nature, this multidatabase model is most appropriate to Zopelications which provide for local needs in a local database, but need to reference other, shared databases or legacy systems. (Note: this model does not support multi-database undo, which requires a global naming mechanism for databases and transactions, so that data integrity can be maintained by refusing to undo transactions unless all databases which were involved can undo it.) Anyway... making references work. Initially, to test this concept, a simple mode of implementation at the application level is to provide a getReference function. If I want to store an object in one of my attributes, I would say: self.attribute = getReference(self,object) The function would look something like: def getReference(source,target): tgt_jar = target._p_jar # note: fails if we can't make reference if tgt_jar is source._p_jar or tgt_jar is None: return target return RemoteReference(getReference(source,tgt_jar),target._p_oid) This recursively builds a dereferencing object which, when retrieved from my self.attribute later, will return the object from the correct database. The RemoteReference class is as follows (or equivalent in C): class RemoteReference(ExtensionClass.Base): def __init__(self,jar,oid): self.jar, self.oid = jar,oid def __of__(self,parent): object = self.jar[self.oid] if hasattr(object,'__of__'): return object.__of__(parent) return object The RemoteReference class simply refers to a jar (which must be a persistent object) and an oid to be retrieved from the jar. When a RemoteReference is retrieved from an object, it will replace itself with the result of retrieving that oid from that jar, and call __of__ on that result. (Note that if the jar itself can be referenced by a RemoteReference, and it will be unpacked when we do self.jar to use it. Thus, a reference "two databases deep" (or more) will be properly unpacked.) Notice that this works even if a portion of a database tree is isolated and used as a root unto itself, since anything stored in a given database can only reference objects in itself, or in databases referenced from it. In order for this protocol to work, one need only do two things: * Any database which wishes to be referenceable must be able to have Connection-like objects stored as Persistent objects. * When storing a reference to another object, one must call getReference(self,object) and store the result, AND, self must already be assigned a _p_jar. The first requirement burdens database implementors, but it is not that far out of the question. It merely needs a Persistent object which can delegate its behavior to a "real" Connection object of some kind. The second requirement burdens those who would store foreign references, and seems a bit more severe, although it seems that often one will know when one is trying to do this. This application-level restriction could be eased by extending databases' persistent_id mechanism (used w/cPickle) to return a RemoteReference as the oid of an object stored in a foreign jar. When they are asked for an object whose oid is a RemoteReference object, they can simply return the RemoteReference itself, or automatically dereference it. The latter has the potential problem of perhaps unnecesarily waking up currently dormant databases, but I suspect it is unlikely to be a real problem in practice. (Note that any such waking-up will be bounded by the depth of the database tree which is currently in use, and also that this mechanism does not preclude the future use of RemoteReferences based on a global naming scheme, or of cyclical references under such a scheme.) To sum up, this seems like a reasonably workable approach to cross-database references in Zope where such references proceed from private "roots" to shared "leaves" of a database tree. It is incrementally implementable, and does not initially require changing any part of the existing Zope framework. But, with additional effort, it can be scaled up to provide better ease-of-use and generality. Creating a "database" that can take advantage of the protocol could be almost as simple as making a Persistent object whose __getitem__ method calls an SQL method to retrieve something from a database, then sets object._p_jar=self and object._p_oid=retrieval_key. Presto, you now have an SQL record which can be "pointed to" by ZODB objects, which need not concern themselves with the details of SQL involved. At this point, all sorts of application ideas begin bubbling through my head, ranging from having counter-type objects stored in suitable storages, to having "storage-managed object pools", a concept Ty and I have been batting around for some time as a means of reducing certain types of write-contention in large applications, and for taking advantage of BerkeleyDB and other databases' native indexing facilities. Anyway, further applications are left as an exercise for the reader. :) Comments?