[ZODB-Dev] ZEO Client locked in tpc_begin
Thierry Delprat
thierry.delprat at unilog.fr
Wed Oct 6 12:06:45 EDT 2004
We experience a complete Zeo Client freeze in a production environment:
=> all ZServer threads are busy, CPU is idle.
We use :
- Zope 2.7.2 / python 2.3.3
- a CMF site
- 1 Mount Point (for the CMF Catalog)
- 1 ZEO Client, 1 ZSS
After adding a lot of logs, it seems that the freeze occurs in the tpc_begin
of the ClientStorage :
=> the thread enters the "while self._transaction is not None:" loop and
never exists.
As this thread has acquired a global lock (_tpc_cond), all other threads
trying to commit are also locked.
This problem seems to be related with the fact that we encounter a lot of
"Shouldn't load state" errors.
When this errors occurs there is a chance the current transaction is not
released, and the next time we try a transaction on the ClientStorage we
enter the infinite loop waiting for the last transaction to complete.
We succed in reproducing this problem in a test environment:
===========================================================================
Configuration:
1 zope server with 5 mounts points
(1 is Temporary Folder for session)
1 zeo server with 4 storages
(default storage)
1 External method which create randomly 1 or 4 objects
(OFSFolder distributed randomly in different mount point)
1 multithreaded script which call the external method
((x simultaneous thread)*y series)
for each (x simultaneous thread) session we wait result before to launch
another session
All storage implement sortKey, and the order of jars=_get_jars (in
Transaction.py) is respected during all the process in our trace.
We observed the following behaviour :
we have some "shouldn't load state" from Connection.py line 545
or NoneType has no attributes tpc_begin
or NoneType has no attributes tpc_abort
or NoneType has no attributes tpc_finish
Connection seems to be closed or _storage set to None by another thread.
also if a connection pass the tpc_begin with success, set the _transaction
of ClientStorage and crash during the tpc_abort (_transaction not set to
None), the other threads wait indefinitely in tpc_begin of ClientStorage
(_transaction is not None and _transaction!=txn).
This error arrives not really often, we can launch a lot of sessions without
any result, however when we use threadframe to monitor Zpublisher Threads,
we obtain more easily the problem.
========================================================================
Any help on this subject would be greatly appreciated...
Thierry
More information about the ZODB-Dev
mailing list