[Phillip] http://dev.zope.org/Wikis/DevSite/Proposals/ORMappingDB Comments encouraged! [Albert] I've added some there. Jim highlighted a project Risk there: "Updates to RDBMS data outside of the OR mapping could cause cached data to be inconsistent." This strikes me as rather fundamental. Unless the goals include actual *sharing* of RDBMS data with other applications completely independent of Zope I doubt that the most important benefits of an OR mapping could be achieved. Essentially SQL RDBM Systems are *about* sharing data among applications. When "customers want SQL" that is often what they actually want. An SQL RDBMS can be overkill for other purposes which may be just as well achieved by an embedded ODBMS like ZODB, an SQL file system like MySQL or an LDAP directory. Alternative goals for *exporting* ZODB data to an RDBMS continuously, *importing* data from an RDBMS at regular intervals and *embedding* an RDBMS database for exclusive use by Zope with no write access for other applications could all be met more easily. There is certainly no major difficulty on the RDBMS side, giving a Zope instance control over a set of tables for it's own use and providing append only and read only access to export and import tables or views for regular or continuous replication. But the combination of all 3 (which could be delivered incrementally in any order) is *not* the same as *sharing*. As I understand it, Zope's approach to cacheing inherently prevents support for the Isolation part of ACID. Conflicting writes to the same object are detected by version stamps but the objects used by a transaction in one thread may have been inconsistently changed by transactions in other threads. This will not be detected unless those objects used are also changed. Similar problems are inherent in LDAP directories, which are also designed for relatively static data with a low rate of updates. This is acceptable for many applications. Scope can and should be limited to sharing that works with optimistic checkout and does not require pessimistic locking. It is common for an "Enterprise Object" to be read from an RDBMS with it's stamp noted, modified independently by an application and then updated iff the stamp was not changed. Only the simultaneous checking of the stamp and update of the object needs to be wrapped within a short ACID RDBMS transaction. For example ACS 4 maintains a timestamp on every object which can be used for this purpose. This is similar to the ZODB approach. Note however that: 1) The application must be prepared to deal with an exception that cannot just be handled as a lower layer "ConflictError" by retrying. 2) The object will often be a composite - eg an order header *and* all it's line items, and fulfilments. Entanglement with other objects such as products (for pricing) is avoided by specific application programming (which may also be done in stored procedures within the DBMS). 3) This does not support *any* cacheing of objects outside of a transaction. The RDBMS itself provides internal cacheing (often of the entire database for efficient queries with web applications). This leads to the ACS paradigm of "the web server is the database client", which is actually rather similar to Zope's "Zserver is the ZODB client". Both ACS and Zope involve complex issues for database client side cacheing Both 1 and 2 completely preclude any possibility of the same level of "transparency" as for ZODB, while in no way hindering use of "pythonic" syntax. For most Zope web object publishing purposes cached objects just need to be kept reasonably up to date rather than synchronized with RDBMS transactions. The only viable mechanism I can think of for dealing with item 3 in a Zope context would involve the RDBMS maintaining a "Changes" table which it appends to whenever any object that has a special column for "ZeoItem" is changed without also changing the value of "ZeoItem". (ACS does not do this and I'm not sure what it does do). Zeo would monitor that table, either by regular polling or continuously (eg with PostgreSQL as a "LISTENer" responding to NOTIFY commands issued automatically whenever the triggers append to the Changes table). For each change Zeo would notify it's Zope instances to invalidate their caches for that item. I'm not familiar enough with Zope cacheing internals to know whether some other approach is feasible. Requiring such changes in a shared database is certainly undesirable. Q1. Could somebody please post specific URLs for relevant documentation of Zope cacheing? Q2. I have a related question about the Zope design overall. As far as I can make out Zope actually keeps separate copies of persistent objects in RAM for each thread, and relies on the fact that there is a tree structure corresponding to the URL paths that ensures objects from which attributes will be acquired tend to already be in RAM when the acquisition occurs. I assume this is trading off the horrendous inefficiency of multiple (inconsistent) copies of the same persistent object in valuable RAM against the more horrendous alternative of having to do python thread switches on attribute lookups. I'd like to understand the reasoning behind this design given that Medusa, from which Zserver is derived, strongly recommends a single thread as giving higher performance as well as being simpler given the "Reactor" pattern it is using. My guess is that the reason has something to do with: 1) The original file storage blocks in the kernel so multiple threads are needed to avoid this blocking. 2) Some external adaptors are synchronous and would likewise block the thread for a long external request. 3) Some web hits do long complex stuff that should not be allowed to delay other hits unfairly. If so, I'm wondering whether it might be time to review this design and consider whether it is feasible to use a single thread instead. Both windows and unix kernels now have async interfaces to the kernel with facilities like FreeBSD kqueue for non-blocking Medusa style selects on file descriptors as well as sockets. That should take care of 1). I don't know whether BerekelyDB has an async interface but async adaptors are available for database like Oracle and PostgreSQL and LDAP is inherently async. So it should be possible to take care of 2). Item 3) is more complex but I would have thought that anything that runs *too* long needs to be aborted anyway, which might be achieved by monitoring and signals from a separate process whenever it fails to receive some sort of "heartbeat" done in the Reactor main loop. If there is no such mechanism at present it might be needed anyway to deal with other problems that result from not having it that may currently be perceived as just mysterious flakiness. Jerkiness from long hits that are within such an overall limit should be tolerable if the long hits themselves are tolerable since an awful lot of buffering and caching generally occurs between the server and the end user anyway. Can anyone either explain what I have misunderstood or point me to relevant docs or threads about this?