We currently face a strange problem in a high performance Zope setup. Its features are special such that not all of you may fear to face the same problem: The setup consists of a large number of Zope instances all talking to a ZEO. Almost every non-trivial request (all requests not delivering static content) write to the ZODB. This means, we are using the ZODB in a high frequency write szenario -- one, it has not been designed for. The problem: ZEO reports large amounts of "client blocked" occurrencies *BUT* neither the IO nor the CPU on the ZEO host is utilized significantly. This implies that it is not ZEO which is the bottleneck but the ZEO clients. Up to yesterday, I was extremely puzzled by the behaviour of the ZEO clients (i.e. the Zope processes). Usually, they worked quick and reliably but occasionally, everything and especially the IO seemed extremely slow. I added log messages to prominent IO operations and could (e.g.) see occational times for "stat" in the order of 5 to 20 s. The system is complex and there are many possible causes for such desastrous behaviour. One may be problems with the IO subsystem. Our administrators told me they could not see any unusual load on the IO subsystem. But, for them, too, the system is complex. I knew that the strange timings could mean two things: the IO is really that slow or the measurement is flawed. But all measurements went into the same direction. But all measurements used Python.... And this could mean that the Python thread scheduling (controlled by the GIL) is extremely distorted -- distorted in the order of 10 to 20 s. Yesterday (far too late), I decided to prove whether or not there are scheduling distortions. I put another thread in the Zope processes, scheduled periodically, which complains when there is too much discrepancy between the expected and the observed scheduling time. The thread proved: there are scheduling distortions at least in the order of 5 s. Next question: why? This morning I found a candidate: Python's garbage collector. Of course, it holds the GIL (and prevents Python thread scheduling) while it runs. From the documentation, I had the impression that it should be efficient and quick, even for full (generation 2) garbage collections -- but who knows. I disabled automatic garbage collection and replaced it with a peridic call to "gc.collect", logging the executions times. The times started with 0.6s but meanwhile increased to 1.9 s per run and I expect they will easily grow further until 5 to 6 s per run.... Moreover, scheduling distortions are reported precisely when the garbage collector runs... Usually, occasional garbage collection times of 5 to 6 s are not so problematic -- provided that they are not so frequent that they severely interfere with transaction commits. In our case, they do interfere with transaction commits and can severely slow down the complete ZEO client farm due to contention for the commit lock. The reason is a broken TextIndex implementation -- again maybe special for our installation. The aim of this message: should you see strangely behaving ZEO clients and lots of unexplainable "clients blocked" in your ZEO logfile, you may think of scheduling distortions caused by Python's garbage collector. -- Dieter