RE: [Zope] Zope performance too low ?

17 Jun 1999

      ...
...
I'm not familiar enough to Linux to say this for sure, but
my guess is
that it _would_ be using a high-resolution system timer.
Nope, 'ab' uses gettimeofday(2) to return microseconds, and
then does an
internal divide by 1000 to return milliseconds.  This is done all with
itegers however, I suspect there is a loss of precision.
You have to wonder why they'd found the calculations of a benchmarking
program on integer calculations. (Fyi I don't have the Apache sources at
hand myself.)
...
...
...
Ouch. These extremely low transfer rates bother me. The theoretical
bandwidth on a local machine transfer should be way, way
higher. Otoh,
your document was very small, so the stop/start, connect/teardown
margins would be relatively high. I'd like to see benchmarks
with longer
(>3KB) documents.
I redid the same benchmark between two linux boxes on my desk (the same
server as before, the client is RedHat 6.0 on a P166 w/ 10Mb Ethernet
card) with a 36K document, and got 656 KB/sec (~36 requests/sec 147ms
average response time), the effective limit on a moderately loaded
10Mb/s Ethernet lan.  So, Zope can at least hose a 10 Mb
Ethernet (Note,
a 70MB FTP between the two machines yielded 800 KB/s, I think
the amount
of request/response and tcp connect/disconnect overhead could account
for the other 144 KB/s, at least to a 10% margin of error).
Agreed.
...
When Rob buys me a 100MB card for my other machine, I'll rerun the
benchmark to see if Zope can beat the 10Mb limit.
Just for fun, I sat down for five minutes and wrote a small,
general-purpose benchmarking script.

The configuration:

* Zope server NT: NT Server, SP4, Zope 1.11.0pr1 (ZServer), PII/400,
256MB RAM, fast U2W SCSI disk, low-traffic 100mbps network.

* Zope server Linux: Slackware Linux 2.2/glib-2.0.6, Zope 1.11.0pr1
(ZServer), P/133, 32MB RAM, IDE disks, medium-traffic 100mbps network.
This server doubles as mail and Samba server.

* Apache server Linux: Same Linux box, running Apache 1.3.4.

With a small document (12215 bytes including HTTP header), the results
are similar to yours. This is a DTML document, same variables as yours,
just a lot of dummy text.

  Results Zope server NT:

  Total time:         203.39
  Total requests:       1000
  Total bytes:      12215000
  Requests/sec avg:     4.92
  Bytes/sec avg:    60056.44
  CPU load avg:          20%

  Results Zope server Linux:

  Total time:          77.71
  Total requests:       1000
  Total bytes:      12228000
  Requests/sec avg:    12.87
  Bytes/sec avg:    157350.22
  CPU load avg:          15-20% [*]

  Results Apache server Linux (served as static HTML page):

  Total time:          30.76
  Total requests:       1000
  Total bytes:      12601000
  Requests/sec avg:    32.50
  Bytes/sec avg:    409588.83
  CPU load avg:          1-5% [*]

[*] numbers derived from watching output of "top". :-)  I intended to
use xload for that, but I don't have my X Server for NT at hand. Load
average for Zope climbed towards 4.0, whatever that means; load average
for Apache never went above 0.04.

The different "bytes transferred" numbers can be accounted for by
differences in the HTTP headers between ZServer on NT, ZServer on Linux,
and Apache.

I'm not implying that comparing Zope with statically served pages on
Apache is valid; it's interesting data, but not useful except as a meter
on "how fast things can really go" when publishing static pages. Note
that I tested against an Apache server that is already under some
stress, as it already hosts a number of public web sites. This
inevitably also affects the purity of the Zope numbers.

Also, the fastest machine was running NT, not known for being an
efficient web serving platform, or very efficient at all, period. ;-)
Compare the results: 12.87 requests/sec on a Linux Pentium/133, 4.92
requests/sec on an NT Pentium II 400MHz. No, I wouldn't mind if
Microsoft lost in a certain ongoing antitrust trial. :-)  We don't throw
out old desktop computers anymore: We set up them up to run Linux.

As an aside, I also tested a small DTML Method (2014 bytes including
HTTP header) just to see how well Zope handles more complex pages. It
has a top navigation bar which iterates through its parents to build a
Yahoo!-ish link list; then sets some REQUEST variables; and finally does
an  loop to enumerate five documents or so
in the parent folder and output their titles and a short description
property. In other words, not very complex, but certainly not a "static"
page, either:

  Results Zope server NT:

  Total time:         198.11 secs
  Total requests:       1000
  Total bytes:       2014000
  Requests/sec avg:     5.05
  Bytes/sec avg:    10165.81
  Average CPU load: 38%

I did not test this on the Linux machine as I'd have to copy a large
part of my Zope database (including some Z Classes) in order to emulate
the same functionality. Suffice to say we can expect a lot better than
this. It's interesting to note that, while the transfer rate drops
drastically -- we can easily blame this on the size of the rendered
document -- the request rate is more or less the same as with the very
simple, larger page. What's to accepted is that the CPU load increases.

I look forward to testing more -- especially with concurrent requests --
with Zope 2.0 as it grows more mature. We're setting up a brand new
Linux server to run Zope, and I'll have some time to run more refined
benchmarks in a clean environment (no mail, web or file daemons grabbing
CPU slice).
...
Also not that the ~37 requests per second I got (this is the average
between all benchmarks) equates to approx *3.2 million hits per day* on
one moderately fast PII with no fancy hardware.
Great, but this assumes 3.2 million hits spread across 24 hours. In the
Real World, traffic is not that even, and the real test of Zope's
scalability comes when you do 40+ concurrent clients.

[snip]
...
...
I'd be very interested to hear about what the CPU load was during the
test run. If it's anything above 5-8% (average), that's not
promising at
all. The fact that adding more concurrent sessions did not regrade
performance seems to indicate that CPU load was not too bad.
Note that load average and % of time spent in the CPU are not the same
thing.
Of course. I don't even know exactly what "load average" on Linux means.
:-)  For the above tests I clocked the average CPU load on NT by running
Performance Monitor.

[snip]
...
...
I'm planning on putting together a simple httplib-built
benchmark test
just to get some raw numbers, although I'd like to try putting the
timing logic inside Zope.
We would be interested in seeing what you come up with (code and
numbers).
As I'm not known in any circle as "The Benchmarking Guru", and probably
will never be, I thank you for your friendly and accommodating position
on this matter. :-)

This is the simple script I jotted down to do the tests. It seems to be
inconsequential whether the timing includes getfile() and doc.read(),
even though as far as I can see the byte stream isn't read from the
socket until you actually perform the read. This may well be explained
by socket buffering or NIC-level buffering happening behind the scenes.
I don't know enough about the innards of the httplib, mimetools, and
rfc822 modules to check this, and I don't have the time right now.

import httplib
import os.path
import time
import sys
import string

TotalDocs = 0
TotalTime = 0.0
TotalBytes = 0

def Request(Url):
  global TotalTime, TotalDocs, TotalBytes

  if Url[:7] == 'http://':
    Url = Url[7:]
  i = string.find(Url, "/")
  if i >= 0:
    Server = Url[:i]
    DocUrl = Url[i:]
  else:
    DocUrl = ''
  if DocUrl == '':
    DocUrl = '/'
  StartTime = time.time()
  HttpRequest = httplib.HTTP(Server)
  HttpRequest.putrequest('GET', DocUrl)
  HttpRequest.putheader('Host', Server)
  HttpRequest.putheader('Accept', 'text/html')
  HttpRequest.putheader('Accept', 'text/plain')
  HttpRequest.endheaders()
  httpcode, httpmsg, headers = HttpRequest.getreply()
  if httpcode != 200:
    raise "HTTP error %d getting document" % httpcode
  doc = HttpRequest.getfile()
  TotalBytes = TotalBytes + len(doc.read())
  TotalTime = TotalTime + time.time() - StartTime
  TotalDocs = TotalDocs + 1
  doc.close()

class CommandLine:
  def __init__(self):
    self.Args = {}
    i = 1
    while i < len(sys.argv) - 1:
      if sys.argv[i][:1] in ['-', '/']:
        self.Args[sys.argv[i][1:]] = sys.argv[i + 1]
        i = i + 2
      else:
        i = i + 1

  def __getitem__(self, i):
    if self.Args.has_key(i):
      return self.Args[i]
    else:
      return None

def Main():
  Args = CommandLine()

  Document = Args['u'] or Args['url']
  if not Document:
    print "hb: No document specified"
    return
  RequestsToDo = string.atoi(Args['n'] or Args['number'] or '1')
  print "Document: %s" % Document
  print "Requests: %d" % RequestsToDo
  print "..."

  while RequestsToDo > 0:
    Request(Document)
    RequestsToDo = RequestsToDo - 1

  print "Total time:       %8.2f" % TotalTime
  print "Total requests:   %8d" % TotalDocs
  print "Total bytes:      %8d" % TotalBytes
  print "Requests/sec avg: %8.2f" % (TotalDocs / TotalTime)
  print "Bytes/sec avg:    %8.2f" % (TotalBytes / TotalTime)

if __name__ == '__main__':
  Main()

--
Alexander Staubo             http://www.mop.no/~alex/
"`Ford, you're turning into a penguin. Stop it.'"
--Douglas Adams, _The Hitchhiker's Guide to the Galaxy_