Python on multi-processor machines (Was Re: [Zope] Re: Windows
vs. Linux)
Matthew T. Kromer
matt@zope.com
Thu, 29 Aug 2002 15:06:23 -0400
Dennis Allison wrote:
>
>
>>
>>
>
>That is precisely the configuration I run without problem.
>
>I have not (yet) looked at the Python code, but I am reasonably sure my
>intuition is correct. (Matt or Guido -- correct me if I am wrong...)
>
>First, safety is not an issue modulo thread safety in the uniprocessor
>machine and the correctness of the SMP implementation. Multiple threads
>allocated to different processors function correctly. The problem is with
>performance since the GIL serializes everything and blocks all processors,
>not just the processor on which the thread is running. This means that
>the second processor does not contribute to the execution as it could, so
>the effective CPU available is closer to 1.0 than 2.0.
>
>
Well, in worst case, it can actually give you performance UNDER 1X. The
latency switching the GIL between CPUs comes right off your ability to
do work in a quanta. If you have a 1 gigahertz machine capable of doing
12,000 pystones of work, and it takes 50 milliseconds to switch the
GIL(I dont know how long it takes, this is an example) you would lose 5%
of your peak performance for *EACH* GIL switch. Setting
sys.setchechinterval(240) will still yield the GIL 50 times a second.
If the GIL actually migrates only 10% of the time its released, that
would 50 * .1 * 5% = 25% performance loss. The cost to switch the GIL
is going to vary, but will probably range between .1 and .9 time quantas
(scheduler time intervals) and a typical time quanta is 5 to 10ms.
The 'saving grace' of the linux scheduler is that when a thread gives up
the GIL, it almost immediately gets it back again, rather than having
another thread acquire it. This is bad for average response time, but
good for throughput -- it means the threads waiting on the GIL are woken
up, but will fail to get the GIL and go back to sleep again.
However, I have directly observed a 30% penalty under MP constraints
when the sys.setcheckinterval value was too low (and there was too much
GIL thrashing).
Very little in Zope is capable of releasing the GIL and doing work
independantly; some of the database adapters can do that but that
ususally does not represent a large number. Curious side remark: when
you have a LARGE number of threads, you usually do not have enough
database threads! The number of database threads is a default parameter
to an initialization method, and is set to 7. When you DO actually have
lots of concurrent work occuring without GIL thrashing, you need to bump
up the number of Zope database threads. Sites that do a lot of XML-RPC
or other high latency I/O (network IO needed to fulfill a request, not
just send back the response) usually need to bump up the number of
database threads. Otherwise, they block waiting on a database thread in
Zope, which is bad.