Re: [Zope] Building a fast, scalable yet small Zope application

27 Apr 2009


      For huge inserts like that, have you looked at the more modern
alternatives such as Tokyo Cabinet or MongoDB?
I heard about an experiment to transfer 20 million text blobs into a
Tokyo Cabinet. The first 10 million inserts were superfast but after
that it started to take up to a second to insert each item.
I'm not famililar with how good they are but I know they both have
indexing. And I'm confident they both have good Python APIs.
Or watch Bob Ippolitos PyCon 2009 talk on "Drop ACID".

2009/4/27 Hedley Roos <hedleyroos@gmail.com>:
...
I've followed this thread with interest since I have a Zope site with
tens of millions of entries in BTrees. It scales well, but it requires
many tricks to make it work.
Roche Compaan wrote these great pieces on ZODB, Data.fs size and
scalability at http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-...
and http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-does...
.
My own in-house product is similar to GoogleAnalytics. I have to use a
cascading BTree structure (a btree of btrees of btrees) to handle the
volume. This is because BTrees do slow down the more items they
contain. This is not a ZODB limitation or flaw - it is just how they
work.
My structure allows for fast inserts, but they also allow aggregation
of data. So if my lowest level of BTrees store hits for a particular
hour in time then the containing BTree always knows exactly how many
hits were made in a day. I update all parent BTrees as soon as an item
is inserted. The cost of this operation is O(1) for every parent.
These are all details but every single one influenced my design.
What is important is that you cannot just use the ZCatalog to index
tens of millions of items since every index is a single BTree and will
thus suffer the larger it gets. So you must roll your own to fit your
problem domain.
Data warehousing is probably a good idea as well.
My problem domain allows me to defer inserts, so I have a queuerunner
that commits larger transactions in batches. This is better than lots
of small writes. This may of course not fit your model.
Familiarize yourself with TreeSets and set operations in Python (union
etc.) since those tools form the backbone of catalogueing.
Hedley
_______________________________________________
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )
-- 
Peter Bengtsson,
work www.fry-it.com
home www.peterbe.com
hobby www.issuetrackerproduct.com