Hi there, We're planning a Yahoo! Clubs like system that should scale to about 30, 000 users. Assuming about 3,000 groups and 20MB per group (group functionality includes photo albums), gives a database size of 60GB. Assuming on average 3,000 users per day, 20 page views per users, gives about 60,000 page views (not a lot, but if it's all dynamically generated?). We'd like to use Zope for this, if it is possible. Other options are ok too; anyone have any experience with other ready-made systems? At this scale, how would ZODB hold up with respect to memory use and speed? I've heard rumors that it loads an index into memory on start-up. How would using Oracle as a Storage (ZODB semantics) help? Going full-scale RDBMS means we'd have to reimplement a lot of existing useful tools, so we'd rather not do that if using Zope. I know we'll have to play with cacheing as well, and as I see there are these options: - SQL method cacheing - Using StandardCacheManagers to cache Python and DTML methods - Using StandardCacheManagers to cache pages (using, e.g., Squid as an HTTP accelerator) - ZEO client object cacheing Any other ideas? Bye, -- Bjorn Stabell <bjorn@exoweb.net>
[no cross-posting, please] On Wed, 6 Jun 2001, Bjorn Stabell wrote:
We're planning a Yahoo! Clubs like system that should scale to about 30, 000 users. Assuming about 3,000 groups and 20MB per group (group functionality includes photo albums), gives a database size of 60GB. Assuming on average 3,000 users per day, 20 page views per users, gives about 60,000 page views (not a lot, but if it's all dynamically generated?).
You're going to need some serious hardware for that. You could do a lot with your setup though (ZEO, RDBMS, distributed application-programming) but I don't have much experience to share on that. In a scenario where each box (if you have several) has its own 60GB Data.fs I'd be worried about disk-activity for one. It seems to me (with my petty 1GB Data.fs) that it is the disks rather than ZODB itself that slows things down.
At this scale, how would ZODB hold up with respect to memory use and speed? I've heard rumors that it loads an index into memory on start-up.
I'm running a 1GB Data.fs with CompressedStorage here and that takes probably about 3-5 minutes on a 1GHz with 1GB RAM. I keep banging my head against it, but it just won't run faster. Let us know how that project progress, will you? :)
On Wed, 6 Jun 2001, Erik Enge wrote:
I'm running a 1GB Data.fs with CompressedStorage here and that takes probably about 3-5 minutes on a 1GHz with 1GB RAM. I keep banging my head against it, but it just won't run faster.
Out of interest, is this startup time avoided when using BerkeleyDB as the storage? I know that it has its own indexes etc. so I am wondering if it no longer needs to load an index into memory. Also how are the disks layed out? Is it possible to have, say, 5 disks each on their own (no RAID) and then split the data.fs over them using PartitionedFileStorage or similar. -Matt -- Matt Hamilton matth@netsight.co.uk Netsight Internet Solutions, Ltd. Business Vision on the Internet http://www.netsight.co.uk +44 (0)117 9090901 Web Hosting | Web Design | Domain Names | Co-location | DB Integration
On Wed, 6 Jun 2001, Matt Hamilton wrote:
Out of interest, is this startup time avoided when using BerkeleyDB as the storage? I know that it has its own indexes etc. so I am wondering if it no longer needs to load an index into memory.
I'm not sure, but as I said in a previous oops-correcting-myself-mail it actually only uses 40 secs - FileStorage, that is.
On Wed, 6 Jun 2001, Erik Enge wrote:
I'm running a 1GB Data.fs with CompressedStorage here and that takes probably about 3-5 minutes on a 1GHz with 1GB RAM. I keep banging my head against it, but it just won't run faster.
Oops, misleading you there. Actually, FileStorage uses about 40 seconds to "initialize" the Data.fs. Can't complain about that. (Although 60GB might not be too fun ;)
On Wed, 6 Jun 2001 11:57:18 +0800, "Bjorn Stabell" <bjorn@exoweb.net> wrote:
I know we'll have to play with cacheing as well, and as I see there are these options:
- Using StandardCacheManagers to cache pages (using, e.g., Squid as an HTTP accelerator)
StandardCacheManager's HTTP implementation is easy to use, but a little simplistic. You can achieve more (with only a little effort) by handling the caching headers yourself. Toby Dickenson tdickenson@geminidataloggers.com
participants (4)
-
Bjorn Stabell -
Erik Enge -
Matt Hamilton -
Toby Dickenson