I think there are some fundamental misunderstandings going on here, but I thought it would be interesting to try to respond anyway. iap@y2fun.com wrote:
Hi,
This issue has been discussed again and again,
I would like to clarify my idea and your comments will be very appreciated.
Suppose we want to provide a server which is:
1) Hosting 1,000,000 members' profile. Each member's disk quota is 5MB.
Which means we need at least 5,000GB (5 TeraGB) disk space.
2) Assume the concurrent accessing to a URL is 1000 request/per second
3) Assume all the requests retrieve dynamic content.
4) We want to leverage the power of Zope which means all the pages should be
rendered by zope.
Having 5 TB of disk space usually means some very high-powered RAID gear; my personal favorite is the EMC Symmetrix units; I think you would probably want at least two of those to provide your coverage. Estimated cost for this is about $5,000,000 (but *very* dependant on EMC's pricing strategies). You could get by for less, by distributing each disk with each CPU (the breadrack approach.) 1000 requests/second isnt terribly high; Zope installations have done 400/sec with no problem. However, these are in situations where Zope is being heavily cached; less than 10% of the requests are actually being rendered by Zope. So, if you wanted no caching (ala everything is completely dynamic content), my estimate is you would need something like 100 1Ghz Intel Pentium III class machines to perform that amount of dynamic rendering. If each of those machines had a 50 GB disk drive, you'd theoretically have your 5TB of disk space. At a rough commercial cost of $4,000 per unit (probably a bit high), that's only $400,000. As a practical matter, you'd then need some pretty hefty load balancing servers; at least two, possibly more. However, that begs the question of how you organize that much disk space. It's not an easy task. Whether or not you use an RDBMS is irrelevant until you can work out a strategy for using the disk space scattered amongst all of those machines. * * * Now, if you forget for a moment about requiring each page to be dynamically rendered each time it is viewed, and you set aside the storage questions, you could estimate that, with a 90% caching rate, you could serve 1000 requests/sec with only about 14 machines (10 renderers, 2 cache servers, and two load balancers). Estimated cost for that is $56,000. What is most unrealistic about this scenario is your assumptions about the member base, and its ratio to expected activity. One million users may only generate 1,000 requests/sec, but they certainly could generate a lot more. In fact, a critical strategy for large systems like this is anticipating for "peak demand" events. Lets say you send an e-mail out to all million people, telling them to log in and check out a particular URL. That timed event will generate a demand curve that is not evenly distributed over time; in fact, it is usually very front-loaded. Within about 5 minutes, more than 10% of the user base will probably respond. This is a raw rate of about 333 requests/sec, but that presumes that the single URL is the only thing they load; usually, a page contains images and other content (style sheets etc) which also much be fetched. Pages with a high art content can have 25 elements or more on them. That pushes the request rate up to 8333 requests/sec; way out of the 1000 request/sec bound.
The priciples I would like to verify are:
1) Some database (RDBMS) should be used instead of FileStorage for ZODB.
2) The ZEO should be used for constructing a cluster computing.
3) The Apache should be the front end instead of ZServer.
4) The PCGI should be the connection between Apache and Zope.
5) I shouldn't create static instance into the ZODB but query the
external database.
6) The "Cache" of zope is useless since all the responses are dynamic rendered.
By the way, how much will this kind of system costs, regardless the hardware?
Iap, Singuan
-- Matt Kromer Zope Corporation http://www.zope.com/