http://dev.zope.org/Wikis/DevSite/Proposals/ORMappingDB Comments encouraged! Shane
At 09:37 AM 5/15/01 -0400, Shane Hathaway wrote:
http://dev.zope.org/Wikis/DevSite/Proposals/ORMappingDB
Comments encouraged!
I see two themes here: an implementation-independent data management API, and metadata. May I suggest that perhaps a design effort focused specifically on these three things, rather than on an object-relational effort per se, might be more valuable to the community as a whole? SmartObjects, ZPatterns, and TransWarp could *all* benefit from community-wide standards for an implementation-neutral data management API and a metadata system. Right now, these efforts are fragmented, and focused on different parts of the puzzle. SmartObjects is focused on permissions and UI aspects, while TransWarp is aimed more at the structural metadata, and ZPatterns specializes in implementation glue. If we had a standardized manipulation API or idioms (like JavaBeans) for application objects, then having lots of ways to *implement* storage would be a good thing. Different products and offerings could co-exist and compete in the storage implementation space, and users would have the benefits of being able to choose their back-ends and app frameworks without being locked into one framework's API. Perhaps I should offer up a counter-proposal, focused on establishing a common API and proposing some of the requirements for same? Presumably we are all agreed that it should be as Pythonic as possible, but no more so. :) Also, API is perhaps not the right word, it is more about access and manipulation idioms. It needs to deal explicitly with the notion of relationships as well as "attributes" in the sense of data fields. And it needs to deal with the notion of how you determine what classes should be used for what things and how to get at those classes (since they may need to be backend-specific). These are issues, by the way, which the current ZODB API dodges, and that is why I've been saying that doing O-R mapping in ZODB doesn't help the key issues of database independence. You *still* have to code to a style that is compatible with changing back-ends. I think it might be helpful if we all got on the same page about what that style should be, and then all these efforts could go forward knowing that in the Zope application space, users will only need to learn one such style at the Python level, and any education efforts about that style can be leveraged across many possible implementation approaches. [sent to list and Wiki]
[Phillip] http://dev.zope.org/Wikis/DevSite/Proposals/ORMappingDB Comments encouraged! [Albert] I've added some there. Jim highlighted a project Risk there: "Updates to RDBMS data outside of the OR mapping could cause cached data to be inconsistent." This strikes me as rather fundamental. Unless the goals include actual *sharing* of RDBMS data with other applications completely independent of Zope I doubt that the most important benefits of an OR mapping could be achieved. Essentially SQL RDBM Systems are *about* sharing data among applications. When "customers want SQL" that is often what they actually want. An SQL RDBMS can be overkill for other purposes which may be just as well achieved by an embedded ODBMS like ZODB, an SQL file system like MySQL or an LDAP directory. Alternative goals for *exporting* ZODB data to an RDBMS continuously, *importing* data from an RDBMS at regular intervals and *embedding* an RDBMS database for exclusive use by Zope with no write access for other applications could all be met more easily. There is certainly no major difficulty on the RDBMS side, giving a Zope instance control over a set of tables for it's own use and providing append only and read only access to export and import tables or views for regular or continuous replication. But the combination of all 3 (which could be delivered incrementally in any order) is *not* the same as *sharing*. As I understand it, Zope's approach to cacheing inherently prevents support for the Isolation part of ACID. Conflicting writes to the same object are detected by version stamps but the objects used by a transaction in one thread may have been inconsistently changed by transactions in other threads. This will not be detected unless those objects used are also changed. Similar problems are inherent in LDAP directories, which are also designed for relatively static data with a low rate of updates. This is acceptable for many applications. Scope can and should be limited to sharing that works with optimistic checkout and does not require pessimistic locking. It is common for an "Enterprise Object" to be read from an RDBMS with it's stamp noted, modified independently by an application and then updated iff the stamp was not changed. Only the simultaneous checking of the stamp and update of the object needs to be wrapped within a short ACID RDBMS transaction. For example ACS 4 maintains a timestamp on every object which can be used for this purpose. This is similar to the ZODB approach. Note however that: 1) The application must be prepared to deal with an exception that cannot just be handled as a lower layer "ConflictError" by retrying. 2) The object will often be a composite - eg an order header *and* all it's line items, and fulfilments. Entanglement with other objects such as products (for pricing) is avoided by specific application programming (which may also be done in stored procedures within the DBMS). 3) This does not support *any* cacheing of objects outside of a transaction. The RDBMS itself provides internal cacheing (often of the entire database for efficient queries with web applications). This leads to the ACS paradigm of "the web server is the database client", which is actually rather similar to Zope's "Zserver is the ZODB client". Both ACS and Zope involve complex issues for database client side cacheing Both 1 and 2 completely preclude any possibility of the same level of "transparency" as for ZODB, while in no way hindering use of "pythonic" syntax. For most Zope web object publishing purposes cached objects just need to be kept reasonably up to date rather than synchronized with RDBMS transactions. The only viable mechanism I can think of for dealing with item 3 in a Zope context would involve the RDBMS maintaining a "Changes" table which it appends to whenever any object that has a special column for "ZeoItem" is changed without also changing the value of "ZeoItem". (ACS does not do this and I'm not sure what it does do). Zeo would monitor that table, either by regular polling or continuously (eg with PostgreSQL as a "LISTENer" responding to NOTIFY commands issued automatically whenever the triggers append to the Changes table). For each change Zeo would notify it's Zope instances to invalidate their caches for that item. I'm not familiar enough with Zope cacheing internals to know whether some other approach is feasible. Requiring such changes in a shared database is certainly undesirable. Q1. Could somebody please post specific URLs for relevant documentation of Zope cacheing? Q2. I have a related question about the Zope design overall. As far as I can make out Zope actually keeps separate copies of persistent objects in RAM for each thread, and relies on the fact that there is a tree structure corresponding to the URL paths that ensures objects from which attributes will be acquired tend to already be in RAM when the acquisition occurs. I assume this is trading off the horrendous inefficiency of multiple (inconsistent) copies of the same persistent object in valuable RAM against the more horrendous alternative of having to do python thread switches on attribute lookups. I'd like to understand the reasoning behind this design given that Medusa, from which Zserver is derived, strongly recommends a single thread as giving higher performance as well as being simpler given the "Reactor" pattern it is using. My guess is that the reason has something to do with: 1) The original file storage blocks in the kernel so multiple threads are needed to avoid this blocking. 2) Some external adaptors are synchronous and would likewise block the thread for a long external request. 3) Some web hits do long complex stuff that should not be allowed to delay other hits unfairly. If so, I'm wondering whether it might be time to review this design and consider whether it is feasible to use a single thread instead. Both windows and unix kernels now have async interfaces to the kernel with facilities like FreeBSD kqueue for non-blocking Medusa style selects on file descriptors as well as sockets. That should take care of 1). I don't know whether BerekelyDB has an async interface but async adaptors are available for database like Oracle and PostgreSQL and LDAP is inherently async. So it should be possible to take care of 2). Item 3) is more complex but I would have thought that anything that runs *too* long needs to be aborted anyway, which might be achieved by monitoring and signals from a separate process whenever it fails to receive some sort of "heartbeat" done in the Reactor main loop. If there is no such mechanism at present it might be needed anyway to deal with other problems that result from not having it that may currently be perceived as just mysterious flakiness. Jerkiness from long hits that are within such an overall limit should be tolerable if the long hits themselves are tolerable since an awful lot of buffering and caching generally occurs between the server and the end user anyway. Can anyone either explain what I have misunderstood or point me to relevant docs or threads about this?
Albert Langer wrote:
[Phillip] http://dev.zope.org/Wikis/DevSite/Proposals/ORMappingDB
Comments encouraged!
[Albert] I've added some there.
Jim highlighted a project Risk there:
"Updates to RDBMS data outside of the OR mapping could cause cached data to be inconsistent."
I agree! In fact, Jim and others here at DC have been suggesting this idea for a long time now, but I've always resisted because of this issue. But now I see two workable approaches to solving the problem: 1) Create an invalidation protocol where other applications are required to update a special table every time they make changes to the database. Zope checks this table at the beginning of each transaction. Databases that have strong support for triggers would be able to do this at the database level. 2) Some kinds of objects will stay in memory only for the duration of a transaction. PJE hinted at this and I like it. Some people may decide that *all* relational objects should behave this way, in which case the decreased performance would still be equal to or slightly better than competing projects AFAICT (since this proposal has the advantage of some of the logic being implemented in C.) Thanks for your comments in the wiki. After talking with others here at DC, it's clear I should have provided a description of the possible solutions for some of the major issues. We're all still learning how the process is supposed to work. The next step in the process is to get it reviewed, after which we can turn this into a project. Your comments will become a part of the project. Shane
participants (3)
-
Albert Langer -
Phillip J. Eby -
Shane Hathaway