[ZODB-Dev] [OT] NoSQL

Sat Nov 14 01:08:19 EST 2009

On Fri, 2009-11-13 at 13:33 -0700, Shane Hathaway wrote:
> Stephan Richter wrote:
> > On Friday 13 November 2009, Roché Compaan wrote:
> >> We had such an opportunity about 2 years ago and although the client
> >> never reached (and probably will never) reach the membership they
> >> dreamed about, they did pay us to develop a storage for members that
> >> could scale to more than a 100 million members. We implemented a data
> >> partitioning strategy at application level. If I had another shot at it,
> >> I would try and develop a distributed ZODB storage, because it would be
> >> a lot simpler compared to what we had to do at application level.
> > 
> > Note that Shane developed a sharding solution a year ago with me. It provides 
> > container-level partitioning.
> > 
> > http://svn.zope.org/z3c.sharding/trunk

Great stuff! This approaches scaling a large data set at application
level though. Don't you think a ZODB storage doing this for you would
solve the problem more generally?

> Thanks for the reminder. :-)
> 
> > This in combination with the encryption work that we did for the ZODB makes 
> > the ZODB actually be a lot more advanced than some of the new comers.
> > 
> > I am very intrigued now to setup an EC2 cluster and install a z3c.sharding 
> > based solution demonstrating 100M users with some data. Mmmh...
> 
> I've been studying how to build an enormous database based on what I 
> know.  There are an incredible number of distributed databases these 
> days, but all of them concern me in one way or another.  I'm wondering 
> if ZODB might actually have a fighting chance in the distributed 
> database realm.  With z3c.sharding or something like it, I think I would 
> set things up as follows:
> 
> - In-memory ZODB caches would probably be pointlessly painful at that 
> scale, so I would set the ZODB cache size for all partitions to 0.  A 
> cache size of 0 allows ZODB to cache for the duration of a request, but 
> flushes all objects out of the cache at transaction boundaries.
> 
> - With the cache size set to 0, we can disable cache invalidation, which 
> will probably be a major win.
> 
> - I would rely heavily on memcached to provide the pickles.  I would try 
> to use the cache checkpointing algorithm I recently added to RelStorage.
> 
> - I would aim to read or write only a small number of objects per 
> request from partitions.

I think that the master index needs to be partitioned as well. In
benchmarks I performed early last year (http://bit.ly/pSVmd), a BTree
could only handle about 250 inserts / second when it approached 10
million objects, so I'm guessing it will be almost unusable at a 100
million.

-- 
Roché Compaan
Upfront Systems                   http://www.upfrontsystems.co.za