[Zope] Object DB Capacity

Paul Everitt Paul@digicool.com
Fri, 9 Apr 1999 08:47:12 -0400


Craig wrote:
> We are currently in the process of building a site with Zope that we
> anticipate will have a large object database. We have some 
> concerns about
> scalability as the database continues to grow.

Does anybody understand the rules Outlook uses to determine wrapping?
Absolutely mystifying.

> What is the maximum capacity of the Zope object database?

As Chris mentioned we have had some experience with large versions of
the database.  Two years ago this, well, this week I suppose, we started
rolling out a classified ad engine for a consortium of newspapers.  It
used a predecessor of the current software, something one hop before
"Principia".

Basically it was a bunch of newspapers which could have a bunch
(2,000-18,000)  of ads.  The ads were full-text indexed which created
index objects that were also stored in the database.  Lots of index
object in fact.  This allowed undo to return the ads _and_ the index
back to its original state, so we also had multiple versions of the
ads/indices.

Anyway, using an older version of the technology, they're doing about a
hundred papers with an average of a 1.5mil ads loaded/indexed per week.
A year ago they were doing a million hits a day.  At one point (around
50 papers) their database sizes were getting over a Gb.  (We encouraged
more aggressive packing at that point.

Zope2 will go a long way towards making object storage more
confidence-inspiring through (a) change in format and (b) allowing
storage in "safe" managers such as an RDBMS or bsddb.

> At what point
> does performance begin to degrade?

Obviously the Zope cache is extremely important.  If it's working
correctly and if your activity model matches something were a cache can
help, then the cache can be quite useful.

The size of the database is certainly a problem for database operations.
Packing will obviously get a lot slower the bigger the size of the
database (I guess in Zope2 packing will happen in the background).
Doing a Find will be a nightmare if you have a billion objects.

Otherwise, well, it depends on your activity model.

Fortunately I have the benefit of seeing what we are working on now and
realizing how it will help.  Anybody that goes to LinuxExpo will see it
as well :^)

Namely, the Catalog will leverage the indexing structures that are a
hidden part of Zope and used in Z Tables.  Effectively all content in
your Zope site can choose to be registered in the Catalog.  Meaning the
act of finding something on your site based on criteria (full-text
search, date modified, keyword, category, author, etc.) will be
effectively instantaneous.

There's still a lot of hand-waving involved to get there but the hardest
part (indexing) has been done for a while.

> Is it possible to have multiple Zope servers, each with a subset of
> objects, working together to "load balance"? How about multiple object
> databases on one zope installation? 

Chris mentioned some ideas he has had about taking multiple storage
layers and combining them into one object system.  I'll talk a little
bit about Zope2 which (knock on wood) will be in beta at LinuxExpo next
month.  Well, rather than talk about it, go to www.zope.org/News and
search for ODB.  That will give you a link to the design docs for the
next version of the database.

As for load balancing, here's our current party line on this.  We aren't
actively working on it.  I personally don't feel there's a
one-size-fits-all solution.  Moreover, I don't particularly think people
will take a free, off-the-shelf solution for load balancing and bet
their business on it.  Customization will always be necessary.

Thus I think there will be multiple answers for multiple situations.
I'd like to get a technology partner to help us fund the work on some
basic building blocks.  Then have Digital Creations and others have some
"above the line" software/consulting to solve specific problems.

Just my $0.02.

--Paul