Craig wrote:
We are currently in the process of building a site with Zope that we anticipate will have a large object database. We have some concerns about scalability as the database continues to grow.
Does anybody understand the rules Outlook uses to determine wrapping? Absolutely mystifying.
What is the maximum capacity of the Zope object database?
As Chris mentioned we have had some experience with large versions of the database. Two years ago this, well, this week I suppose, we started rolling out a classified ad engine for a consortium of newspapers. It used a predecessor of the current software, something one hop before "Principia". Basically it was a bunch of newspapers which could have a bunch (2,000-18,000) of ads. The ads were full-text indexed which created index objects that were also stored in the database. Lots of index object in fact. This allowed undo to return the ads _and_ the index back to its original state, so we also had multiple versions of the ads/indices. Anyway, using an older version of the technology, they're doing about a hundred papers with an average of a 1.5mil ads loaded/indexed per week. A year ago they were doing a million hits a day. At one point (around 50 papers) their database sizes were getting over a Gb. (We encouraged more aggressive packing at that point. Zope2 will go a long way towards making object storage more confidence-inspiring through (a) change in format and (b) allowing storage in "safe" managers such as an RDBMS or bsddb.
At what point does performance begin to degrade?
Obviously the Zope cache is extremely important. If it's working correctly and if your activity model matches something were a cache can help, then the cache can be quite useful. The size of the database is certainly a problem for database operations. Packing will obviously get a lot slower the bigger the size of the database (I guess in Zope2 packing will happen in the background). Doing a Find will be a nightmare if you have a billion objects. Otherwise, well, it depends on your activity model. Fortunately I have the benefit of seeing what we are working on now and realizing how it will help. Anybody that goes to LinuxExpo will see it as well :^) Namely, the Catalog will leverage the indexing structures that are a hidden part of Zope and used in Z Tables. Effectively all content in your Zope site can choose to be registered in the Catalog. Meaning the act of finding something on your site based on criteria (full-text search, date modified, keyword, category, author, etc.) will be effectively instantaneous. There's still a lot of hand-waving involved to get there but the hardest part (indexing) has been done for a while.
Is it possible to have multiple Zope servers, each with a subset of objects, working together to "load balance"? How about multiple object databases on one zope installation?
Chris mentioned some ideas he has had about taking multiple storage layers and combining them into one object system. I'll talk a little bit about Zope2 which (knock on wood) will be in beta at LinuxExpo next month. Well, rather than talk about it, go to www.zope.org/News and search for ODB. That will give you a link to the design docs for the next version of the database. As for load balancing, here's our current party line on this. We aren't actively working on it. I personally don't feel there's a one-size-fits-all solution. Moreover, I don't particularly think people will take a free, off-the-shelf solution for load balancing and bet their business on it. Customization will always be necessary. Thus I think there will be multiple answers for multiple situations. I'd like to get a technology partner to help us fund the work on some basic building blocks. Then have Digital Creations and others have some "above the line" software/consulting to solve specific problems. Just my $0.02. --Paul