In article <000d01bfddfb$4546f070$3e48a4d8@digicool.com>, Evan Simpson <evan@digicool.com> wrote:
----- Original Message ----- From: Jimmie Houchin <jhouchin@texoma.net>
Will an app as described above still suffer from problems with high writes?
Possibly, but only if there are hidden hotspots. For example, in your [...]> 2. Implement the application-level conflict handling you read about, so that Folders and Catalogs can decide that two writes don't conflict after all, and merge them into a single update.
Unfortunately, this doesn't deal with cases where the conflicting state is contained in many objects (see note by PJE in the ZODB Wiki). Also, there is a whole other area of difficulty for high-write-volume ZODBs, which is the ammount of IO that needs to be done. First, by nature ZODB can't rewrite a single attribute of an object, it has to rewrite the entire thing. Indexing is also a bear from an IO perspective. First, BTrees currently keep a count at each level, so every change to a btree changes a node at each level of the BTree. For a ZCatalog, there are a lot of btrees (something like 2n+4 for n indexes, I think -- don't quote me on that, it's been a while), and each one changes (last I looked, every index was updated even if the value indexed in a particular one hadn't changed. This may have been improved since). Not only is this bad from a hotspot point of view (always a conflict on the root node of the tree), but you end up doing a *lot* of IO. During my experiments that led to BerkeleyStorage, I was watching the Data.fs grow by 47K per transaction for adding indexed objects of ~1K in size. Watching this with tranalyzer, this turns out to be 1K of object, and 46K of updated btree pages :). Note that BerkeleyStorage only prevents the file from growing that much -- it still has to do all that IO (in fact, it has to do ~2-3 times that much IO, due to the nature of BerkeleyDB. A relational storage would have similar issues. For ammount of IO done, FileStorage is about as efficient as you can possibly be -- it's just that it trades that off against space reclamation). Also, with any kind of Berkeley or Relational storage, there is a second hidden IO and storage penalty: you're storing a btree inside a btree. In other words, the lower-level DB uses btrees to store your objects, including interior nodes of the higher-level ZODB btree. Every interior node of the ZODB Btree needs a leaf node (and supporting interior nodes) in the DB's btrees. so you get taxed twice, on both I/O and storage space used. Not to discourage anyone from using ZODB, necessarily. There are a lot of things it's fantastic for, and without a doubt ZODB is getting better at handling higher write ratios. Over time there will be more and more applications that previously would have required an external SQL or other kind of database that can be done in ZODB instead. However, there will also IMHO always be applications that ZODB just isn't as suitable for. You have to thing long and hard before committing to one or ther other. And then there's the worry of what happens if you chose wrong. We were faced with exactly these issues, and the extremes of them, to boot. We have a *large*, *very* high write ratio, lots of indexes type of application based on ZPublisher/DTML that we'd like to port to/replace with something Zope based. Yet we might need to make another instance of this same type of application used by only a few people with a small ammount of data -- it would really suck to have to have to have another instance of the same expensive database system to support a miniscule ammount of data, because everything was coded only with SQL in mind). This is what led ultimately to ZPatterns -- you can write applications and not have to decide up front on ZODB or SQL. And you can change your mind later (Seen that TV commercial? suddenly your online store is selling a zillion items per month instead of the 1000 you planned for. oops!). You can even decide on an instance by instance basis. You configure with ZODB for a small department or client, and Oracle or Sybase for a huge one -- and the small guy doesn't have to pay for the DB license and DBA!). Since then, we've discovered a number of other benefits to the model. Hmmm... I didn't intend to write a ZPatterns advertisement when I started, honest! But this seems to have turned into one nonetheless :^)