[Zope] Apache/Zope: methods for serving high traffic?

sean.upton@uniontrib.com sean.upton@uniontrib.com
Thu, 03 May 2001 13:31:50 -0700


Hey all,

I need some deployment advice. After 2 years of doing relatively small stuff
in Zope - mainly low volume apps, intranet stuff - I am at a point where a
very high-traffic-volume application is about to be deployed, and I am left
wondering what the best means of web-server integration is for high-traffic
applications?

In a nutshell, here is my set up: traffic comes in through a layer 4
load-balancer that directs traffic to 2 squid http accelerators (Sun E250s +
1GB RAM + Debian/Sparc/Linux 2.4), which proxy (with a simple perl script as
a redirector or poor person's load balancer) to a farm of two web servers
(VA Linux 1220 PIII/800 1CPU 512MB), which are running Apache and will be
running Zope using Client storage.  I have a temporary database sever that
is a Sun Netra T1 105 (UltraSparc IIe 440MHz, 256MB, 10kRPM LVD SCSI disks,
Debian unstable), and will acting in temporary capacity as a ZEO ZSS using
FileStorage.  Zope will be running using a VHM for virtual hosts.

The initial application running on Zope for our site is going to be an
online Classified Ad system; we are a Top 20 newspaper in a sizeable market
with a lot of traffic; classifieds is a good chunk of our Monday traffic,
and our app needs to scale to anticipate traffic of up to 50,000 page views
an hour, all served from Zope, with about 3/4 of all page views being a
ZCatalog query (search or browse, everything is done with a catalog).  So,
with perhaps 37,000 catalog queries an hour (or 10-15/sec), I want to make
sure I set up my infrastructure to handle this.  I feel that the CPU usage
associated with the queries will be handled by our 2 new web servers just
fine.  I'm going to be setting up a cache manager for my entire site
hierarchy, so I also am comfortable with the idea that all my images will be
cached on the Squid boxes in front of the servers.  I still have a few
questions remaining in my head:

1 - Will my temporary ZSS box handle this (1CPU, 256MB)?  The ODB will
likely contain at most, 20,000 small object instances, most of which are
subclassed from OFS:Folder.  My more permanent ZSS box is a Sun E450 with
1GB Ram that will be running Linux, and I'm sure that that will be an
improvement, but that box is being used for something else, and isn't
available now.

2 - How do catalog queries get cached in the cache managers?  Is it likely
that a simple query (like 'browsing' a section of ads through a catalog
query) will be cached?  If I was to guess, I would think that 1/3 of all
traffic is likely going to be esoteric searches that are not really useful
to cache.

3 - So far, I can think of 3 ways to present my Zope apps to the public:
	a.
[REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(ZSERVER)
	b.
[REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(PCGI)->(ZOPE)
	c. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(ZSERVER)

I am included to do c, but if that doesn't work out, what is anyone's
experience working with a vs. b (in terms of PCGI vs. PROXY to ZServer)?

Any feedback would be REALLY appreciated.  One I get this in place, I would
be happy to reciprocate with advice for anyone else in a similar
situation...

Thanks,
Sean

=========================
Sean Upton
Senior Programmer/Analyst
SignOnSanDiego.com
The San Diego Union-Tribune
619.718.5241
sean.upton@uniontrib.com
=========================