Apache/Zope: methods for serving high traffic?
Hey all, I need some deployment advice. After 2 years of doing relatively small stuff in Zope - mainly low volume apps, intranet stuff - I am at a point where a very high-traffic-volume application is about to be deployed, and I am left wondering what the best means of web-server integration is for high-traffic applications? In a nutshell, here is my set up: traffic comes in through a layer 4 load-balancer that directs traffic to 2 squid http accelerators (Sun E250s + 1GB RAM + Debian/Sparc/Linux 2.4), which proxy (with a simple perl script as a redirector or poor person's load balancer) to a farm of two web servers (VA Linux 1220 PIII/800 1CPU 512MB), which are running Apache and will be running Zope using Client storage. I have a temporary database sever that is a Sun Netra T1 105 (UltraSparc IIe 440MHz, 256MB, 10kRPM LVD SCSI disks, Debian unstable), and will acting in temporary capacity as a ZEO ZSS using FileStorage. Zope will be running using a VHM for virtual hosts. The initial application running on Zope for our site is going to be an online Classified Ad system; we are a Top 20 newspaper in a sizeable market with a lot of traffic; classifieds is a good chunk of our Monday traffic, and our app needs to scale to anticipate traffic of up to 50,000 page views an hour, all served from Zope, with about 3/4 of all page views being a ZCatalog query (search or browse, everything is done with a catalog). So, with perhaps 37,000 catalog queries an hour (or 10-15/sec), I want to make sure I set up my infrastructure to handle this. I feel that the CPU usage associated with the queries will be handled by our 2 new web servers just fine. I'm going to be setting up a cache manager for my entire site hierarchy, so I also am comfortable with the idea that all my images will be cached on the Squid boxes in front of the servers. I still have a few questions remaining in my head: 1 - Will my temporary ZSS box handle this (1CPU, 256MB)? The ODB will likely contain at most, 20,000 small object instances, most of which are subclassed from OFS:Folder. My more permanent ZSS box is a Sun E450 with 1GB Ram that will be running Linux, and I'm sure that that will be an improvement, but that box is being used for something else, and isn't available now. 2 - How do catalog queries get cached in the cache managers? Is it likely that a simple query (like 'browsing' a section of ads through a catalog query) will be cached? If I was to guess, I would think that 1/3 of all traffic is likely going to be esoteric searches that are not really useful to cache. 3 - So far, I can think of 3 ways to present my Zope apps to the public: a. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(ZSERVER) b. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(PCGI)->(ZOPE) c. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(ZSERVER) I am included to do c, but if that doesn't work out, what is anyone's experience working with a vs. b (in terms of PCGI vs. PROXY to ZServer)? Any feedback would be REALLY appreciated. One I get this in place, I would be happy to reciprocate with advice for anyone else in a similar situation... Thanks, Sean ========================= Sean Upton Senior Programmer/Analyst SignOnSanDiego.com The San Diego Union-Tribune 619.718.5241 sean.upton@uniontrib.com =========================
Hi sean, i will try to help you whith i can. I think that the option c is realy the best one, i personaly dont like pcgi because it's too slow, i believe that in the battle a vs b, i choose a because they work by threads and is faster than pcgi. If the request method to the ZCatalog pages be GET, then squid will cache the response for you, just set the headers. Good Luck! -- Mauricio Souza Lima WebDeveloper - Catho ONLINE mauricio@catho.com.br www.catho.com.br mauriciosl@yahoo.com.br sean.upton@uniontrib.com wrote:
Hey all,
I need some deployment advice. After 2 years of doing relatively small stuff in Zope - mainly low volume apps, intranet stuff - I am at a point where a very high-traffic-volume application is about to be deployed, and I am left wondering what the best means of web-server integration is for high-traffic applications?
In a nutshell, here is my set up: traffic comes in through a layer 4 load-balancer that directs traffic to 2 squid http accelerators (Sun E250s + 1GB RAM + Debian/Sparc/Linux 2.4), which proxy (with a simple perl script as a redirector or poor person's load balancer) to a farm of two web servers (VA Linux 1220 PIII/800 1CPU 512MB), which are running Apache and will be running Zope using Client storage. I have a temporary database sever that is a Sun Netra T1 105 (UltraSparc IIe 440MHz, 256MB, 10kRPM LVD SCSI disks, Debian unstable), and will acting in temporary capacity as a ZEO ZSS using FileStorage. Zope will be running using a VHM for virtual hosts.
The initial application running on Zope for our site is going to be an online Classified Ad system; we are a Top 20 newspaper in a sizeable market with a lot of traffic; classifieds is a good chunk of our Monday traffic, and our app needs to scale to anticipate traffic of up to 50,000 page views an hour, all served from Zope, with about 3/4 of all page views being a ZCatalog query (search or browse, everything is done with a catalog). So, with perhaps 37,000 catalog queries an hour (or 10-15/sec), I want to make sure I set up my infrastructure to handle this. I feel that the CPU usage associated with the queries will be handled by our 2 new web servers just fine. I'm going to be setting up a cache manager for my entire site hierarchy, so I also am comfortable with the idea that all my images will be cached on the Squid boxes in front of the servers. I still have a few questions remaining in my head:
1 - Will my temporary ZSS box handle this (1CPU, 256MB)? The ODB will likely contain at most, 20,000 small object instances, most of which are subclassed from OFS:Folder. My more permanent ZSS box is a Sun E450 with 1GB Ram that will be running Linux, and I'm sure that that will be an improvement, but that box is being used for something else, and isn't available now.
2 - How do catalog queries get cached in the cache managers? Is it likely that a simple query (like 'browsing' a section of ads through a catalog query) will be cached? If I was to guess, I would think that 1/3 of all traffic is likely going to be esoteric searches that are not really useful to cache.
3 - So far, I can think of 3 ways to present my Zope apps to the public: a. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(ZSERVER) b. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(PCGI)->(ZOPE) c. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(ZSERVER)
I am included to do c, but if that doesn't work out, what is anyone's experience working with a vs. b (in terms of PCGI vs. PROXY to ZServer)?
Any feedback would be REALLY appreciated. One I get this in place, I would be happy to reciprocate with advice for anyone else in a similar situation...
Thanks, Sean
========================= Sean Upton Senior Programmer/Analyst SignOnSanDiego.com The San Diego Union-Tribune 619.718.5241 sean.upton@uniontrib.com =========================
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
On Thu, 03 May 2001 13:31:50 -0700, sean.upton@uniontrib.com wrote: I spent some time planning a site with similar characteristics last year... The project was canned before it got out of a lab environment, so I can only offer untested advice.
a. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(ZSERVER) b. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(APACHE)->(PCGI)->(ZOPE) c. [REQUEST]->(LOADBALANCER)->(SQUID+REDIRECTOR)->(ZSERVER)
You have a squid in this mix, which you say you are using as a redirector only. How important is caching to you? In my project caching was all important; cache misses were very expensive In all of those scenarios squid will cache based on the post-redirector url. If you are load balancing between n back-end servers then you will need n times larger cache space, and can expect a smaller cache hit ratio (worst case; n times smaller) What type of load balancing processing are you planning for the squid redirector? If just a random selection, are you aware than Apache's mod_rewrite can do that? (some of Apache's other features make it attactive for the front-end of this proxy pipeline, but squid is not a bad choice either) Unlike in your application, we expected that ZEO-server latency would be a significant factor. To reduce latency in filling ZEO client caches we were planning to distribute load between back-end Zope's based on dataset affinity..... That is preferring to send a request to a Zope that has recently handled a different request for the same data. We looked at doing this in a squid redirector, but it is not so easy to share state between the multiple redirector processes. Eventually we chose to implement this in a new http proxy. Our final configuration looked like: REQUEST->(LoadBalancer)->[(Apache)->(Squid)->(redirector)]->[(ZServer)] Toby Dickenson tdickenson@geminidataloggers.com
participants (3)
-
Mauricio Souza Lima -
sean.upton@uniontrib.com -
Toby Dickenson