Re: AW: [Zope] 2 zope faster 1 zope running

17 Jun 2002

      Copying Zope @ Zope.org since this is useful information.  My numbers 
below are approximations, not hard figures.

Its derived from experimental observation.  A python bytecode, on 
average, executes about 50 machine instructions.  You probably want to 
let a whole CPU quanta expire before voluntarily switching threads.  
Generally a CPU quanta will be about 5 milliseconds.  A 1GHz pentium 
will execute  about 1,000,000 instructions / millisecond, or about 
100,000 python bytecodes / quanta.  The typical Zope publishing path is 
about 1,000,000 bytecodes or more -- so letting that path be interrupted 
10 times or more is overkill (for Zope).  Using my numbers you could 
argue for a much higher ratio. (Ie, if you believe me, Zope "wants" a 
sys.setcheckinterval(100000) on a 1Ghz machine.

 From experimental observation I have detected a levelling off in benefit 
at about pystones/50.  This becomes very noticable on a multiprocessor 
machine.  I believe the levelling off effect comes from other normal 
'blocking' operations inside Zope which cause one thread to suspend.  
Hence the factor of 500 discrepancy :)

The rationale is due to overhead in thread switching, and "thruput" 
optimization.  Consider the following example:

Two threads wish to count from 1 to 10.  After each thread counts a 
single digit, they switch.  A system clock is incremented after each 
count:

Sys     Thr1    Thr2
1            1
2                        1
3            2
4                        2
...
19         10
20                     10

The average time for each thread to complete is 19 + 20 / 2, or 19.5.   
Now consider the example where thread 1 is allowed to run to completion 
before thread 2:

Sys     Thr1     Th2
1            1
2            2
...
10          10
11                      1
...
20                      20

Here, the average time for each thread to complete is 10 + 20 / 2 or 
15.  So, it costs 30% more work to let each thread run "concurrently" 
without factoring in any overhead from the actual act of task switching, 
which in my example was zero, but can never actually be zero.

By increasing sys.setcheckinterval (the default Python value is 10!) we 
allow more work to be done by each thread before it yields control to 
another thread.  The astute observer would also be able to note that the 
total system work for CPU BOUND processes can never exceed the speed of 
serial processing.  Because Zope is primarily CPU bound, fewer threads 
tend to be better.

I believe that a corollary to this is the effect people observe when 
Zope undergoes "superlinear" degredation -- ie, too many things get 
caught up in Zope (because too many threads are started).  I am sure 
this isn't the *only* reason that happens (I dont have a good 
observation suite to analyze it).  However, once internal queues for 
work build up in Zope, they are very difficult to dissipate -- you have 
to have a substantial lessening in the work arrival rate.

N.B. If you use my figure of 1,000,000 bytecodes as a predictor of the 
Zope publishing path, you'll realize that this is about 5 cpu quanta 
(again using a quanta of 5ms) on a 1Ghz machine which is a Zope 
publishing rate of about 40 pages/sec.  For some applications this is an 
optimistic value.  For others, Zope can publish at a faster rate.  This 
is not intended to cover ALL applications, just a 'good guess' at one.  
I suggest running 'ab' or similar against a representative sample of 
YOUR applications pages to convert pages/sec into a guesstimate of the 
"cost" of your application.

On Monday, June 17, 2002, at 10:05 AM, oliver.erlewein@sqs.de wrote:
...
Hi

I've set my new interval from "-i 32" to "-i 200" as my Pystones is 
about 11000. I'll check what changes I will see. Where did you get that 
ratio from or why is it so?

Matthew T. Kromer

tags

participants (1)