Hi, In ERP5[1], which is CMF based, we have a number of strategies for high performance and scalability. One of these is that we have ZSQLCatalog extensively. The other is that we delay execution of potentially expensive operations (like indexing) for background execution. For the latter, we store the information about background tasks to be executed (path to affected object, method to call, serialization and ordering tags) in an SQL table. Background requests (clock-server) then look up the activity table to either distribute the tasks between the nodes of a ZEO cluster or to execute a previously distributed task. In one specific client with a very high volume of transactions, we were experiencing failures in these background executions. We traced it, among other things, to the ordering of connections during commit. Here is what happened. 1. An object in the ZODB 2. the .reindexObject() method of this object schedules a task for the real indexation to the background processes, using a ZMySQLDA connector. 3. The transaction machinery performs the commit, ordering the connections according to the .sortKey() method of each connection: 3.1. All ZMySQLDA connectors involved, since their .sortKey() returns the integer 1 (see Shared.ZRDB.TM.TM.sortKey() ) 3.2. all mounted ClientStorages or FileStorages involved, whose .sortKey()s are strings which sort after integers. If in between 3.1 and 3.2 a background process tries to execute the scheduled activity commited on 3.1, then it will see the new information on the 'background-tasks' table but the object to be indexed will not yet be in the ZODB causing the activity to fail. The solution we found involves changing the result of .sortKey() for the transaction manager of the database connection, but we can't do this globally for all connectors, otherwise we could have the connector for the SQL based catalog being committed after the connector for the background tasks, and we would end up with a similar error situation. The adapter for the background tasks must necessarily commit after all data needed by the background tasks was already committed. By the way, this issue is completely separate from the two-phase-commit discussion that we had recently, since all the connectors involved here are fully transactional. At Nexedi, we concluded that we might need to be able to customize the sortKey() per database-adapter instance in Zope, since different adapters might need to be committed in different order. Unfortunately it looks like the connection sorting machinery was intended only to obtain a consistent ordering to avoid deadlocks from competing clients, instead of establishing dependency relationships between the connectors, since there seems to be no standard on what the sorting keys should be (they're integers for Shared.ZRDB.TM.TM and strings for ZODB storages). To make this easier without requiring reimplementation of the .sortKey() method in all database connectors, I took the liberty of creating a branch of Zope 2.12 [2] that adds a .setSortKey() method to Shared.ZRDB.TM.TM and I'd welcome opinions. In any case, we were left wondering if others have faced similar issues with the commit order and if others have any opinions on this problem. Cheers, Leo [1] http://erp5.com/ [2] http://svn.zope.org/repos/main/Zope/branches/rochael-TM_sortKey/
On 18 June 2010 01:24, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
By the way, this issue is completely separate from the two-phase-commit discussion that we had recently, since all the connectors involved here are fully transactional.
As you can see here: http://zope3.pov.lt/trac/browser/Zope/trunk/src/Shared/DC/ZRDB/TM.py def tpc_vote(self, *ignored): self._finalize = 1 def tpc_finish(self, *ignored): if self._finalize: try: self._finish() finally: self._registered=0 The transaction manager is only doing one phase commit. It sorts first as it commits in the second phase. If you change the sort order, you lose the guarantee of transactional integrity. Perhaps a better way to solve this would be to include the zope transaction id in the table, then in the background thread only reindex the queued items with a tid <= the current tid of the connection. Laurence
Hi Laurence On Fri, Jun 18, 2010 at 08:06, Laurence Rowe <l@lrowe.co.uk> wrote:
On 18 June 2010 01:24, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
By the way, this issue is completely separate from the two-phase-commit discussion that we had recently, since all the connectors involved here are fully transactional.
As you can see here:
http://zope3.pov.lt/trac/browser/Zope/trunk/src/Shared/DC/ZRDB/TM.py
def tpc_vote(self, *ignored): self._finalize = 1
def tpc_finish(self, *ignored):
if self._finalize: try: self._finish() finally: self._registered=0
The transaction manager is only doing one phase commit. It sorts first as it commits in the second phase. If you change the sort order, you lose the guarantee of transactional integrity.
For me this means that TM subclasses need to override tpc_vote and implement a proper commit preparation [1] [2] to assure they are correctly participating in the TPC dance. And if that is not the case, but you have, for example, more than one MySQL connector, you are already in a situation where you can't guarantee transactional integrity, so this discussion is actually orthogonal to the sortOrder one.
Perhaps a better way to solve this would be to include the zope transaction id in the table, then in the background thread only reindex the queued items with a tid <= the current tid of the connection.
Possibly, but is there a way to know the id of a transaction that hasn't been committed yet, to store it on MySQL? Besides, when working with multiple mount points, you might have to store multiple TIDs, for all storages involved, or else there should be a global transaction ID that should be recorded everywhere, and I don't see the 'transaction' package providing one. In any case, does anyone oppose the existence of a .setSortKey() on the TM class? Cheers, Leo [1] http://www.postgresql.org/docs/current/static/sql-prepare-transaction.html [2] http://dev.mysql.com/doc/refman/5.0/en/xa.html
On Fri, Jun 18, 2010 at 3:32 PM, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
In any case, does anyone oppose the existence of a .setSortKey() on the TM class?
Your branch looks simple enough. Feel free to merge it to Zope trunk but don't forget to add a change entry in doc/CHANGES.rst Hanno
On Fri, Jun 18, 2010 at 10:55, Hanno Schlichting <hanno@hannosch.eu> wrote:
On Fri, Jun 18, 2010 at 3:32 PM, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
In any case, does anyone oppose the existence of a .setSortKey() on the TM class?
Your branch looks simple enough. Feel free to merge it to Zope trunk but don't forget to add a change entry in doc/CHANGES.rst
Thanks. Can I commit it to Zope 2.12 as well? It is backward compatible and helps us address a bug we see on production. Cheers,
On Fri, Jun 18, 2010 at 4:29 PM, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
Thanks. Can I commit it to Zope 2.12 as well? It is backward compatible and helps us address a bug we see on production.
Go ahead. It really is trivial :) Hanno
On 18 June 2010 14:32, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
Hi Laurence
On Fri, Jun 18, 2010 at 08:06, Laurence Rowe <l@lrowe.co.uk> wrote:
On 18 June 2010 01:24, Leonardo Rochael Almeida <leorochael@gmail.com> wrote:
By the way, this issue is completely separate from the two-phase-commit discussion that we had recently, since all the connectors involved here are fully transactional.
As you can see here:
http://zope3.pov.lt/trac/browser/Zope/trunk/src/Shared/DC/ZRDB/TM.py
def tpc_vote(self, *ignored): self._finalize = 1
def tpc_finish(self, *ignored):
if self._finalize: try: self._finish() finally: self._registered=0
The transaction manager is only doing one phase commit. It sorts first as it commits in the second phase. If you change the sort order, you lose the guarantee of transactional integrity.
For me this means that TM subclasses need to override tpc_vote and implement a proper commit preparation [1] [2] to assure they are correctly participating in the TPC dance.
zope.sqlalchemy does this, but that brings a whole orm into the equation and does away with ZRDB legacy.
And if that is not the case, but you have, for example, more than one MySQL connector, you are already in a situation where you can't guarantee transactional integrity, so this discussion is actually orthogonal to the sortOrder one.
That's true, but don't you see this problem even with only a single ZODB and a single ZRDB connection?
Perhaps a better way to solve this would be to include the zope transaction id in the table, then in the background thread only reindex the queued items with a tid <= the current tid of the connection.
Possibly, but is there a way to know the id of a transaction that hasn't been committed yet, to store it on MySQL? Besides, when working with multiple mount points, you might have to store multiple TIDs, for all storages involved, or else there should be a global transaction ID that should be recorded everywhere, and I don't see the 'transaction' package providing one.
The ZODB storage's transaction id is set in tpc_begin, so you should be able to get it in tcp_vote or tpc_finish of the ZRDB data manager. Though doing so probably horribly complicates the ZSQLCatalog code.
In any case, does anyone oppose the existence of a .setSortKey() on the TM class?
I don't oppose it, but I also don't see how this will fix the problem unless you set the sort key to be greater than the ZODB's sort key. This strikes me as a very bad idea for a TM that is designed to tpc_finish before anything else. Laurence
participants (3)
-
Hanno Schlichting -
Laurence Rowe -
Leonardo Rochael Almeida