[Zope] Building a fast, scalable yet small Zope application
Peter Bengtsson
peterbe at gmail.com
Mon Apr 27 12:41:50 EDT 2009
For huge inserts like that, have you looked at the more modern
alternatives such as Tokyo Cabinet or MongoDB?
I heard about an experiment to transfer 20 million text blobs into a
Tokyo Cabinet. The first 10 million inserts were superfast but after
that it started to take up to a second to insert each item.
I'm not famililar with how good they are but I know they both have
indexing. And I'm confident they both have good Python APIs.
Or watch Bob Ippolitos PyCon 2009 talk on "Drop ACID".
2009/4/27 Hedley Roos <hedleyroos at gmail.com>:
> I've followed this thread with interest since I have a Zope site with
> tens of millions of entries in BTrees. It scales well, but it requires
> many tricks to make it work.
>
> Roche Compaan wrote these great pieces on ZODB, Data.fs size and
> scalability at http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes
> and http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter
> .
>
> My own in-house product is similar to GoogleAnalytics. I have to use a
> cascading BTree structure (a btree of btrees of btrees) to handle the
> volume. This is because BTrees do slow down the more items they
> contain. This is not a ZODB limitation or flaw - it is just how they
> work.
>
> My structure allows for fast inserts, but they also allow aggregation
> of data. So if my lowest level of BTrees store hits for a particular
> hour in time then the containing BTree always knows exactly how many
> hits were made in a day. I update all parent BTrees as soon as an item
> is inserted. The cost of this operation is O(1) for every parent.
> These are all details but every single one influenced my design.
>
> What is important is that you cannot just use the ZCatalog to index
> tens of millions of items since every index is a single BTree and will
> thus suffer the larger it gets. So you must roll your own to fit your
> problem domain.
>
> Data warehousing is probably a good idea as well.
>
> My problem domain allows me to defer inserts, so I have a queuerunner
> that commits larger transactions in batches. This is better than lots
> of small writes. This may of course not fit your model.
>
> Familiarize yourself with TreeSets and set operations in Python (union
> etc.) since those tools form the backbone of catalogueing.
>
> Hedley
> _______________________________________________
> Zope maillist - Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> ** No cross posts or HTML encoding! **
> (Related lists -
> http://mail.zope.org/mailman/listinfo/zope-announce
> http://mail.zope.org/mailman/listinfo/zope-dev )
>
--
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com
More information about the Zope
mailing list