Alessio, We are piloting a very similar system (plain Zope, no Plone). A few lessons we learned that you might benefit from: 1. Use DirectoryStorage for your ZODB. It is highly stable, scalable, resilient to corruption, easily backed up, etc, etc. The secret to its success is that it uses one file per Zope revision, rather than a single Data.fs. If you do so, you might also consider using reiserfs because it is better with large numbers of files. And there will be a lot! (I believe this only applies to Linux.) 2. Do not store binaries in the ZODB. Your ZODB will grow plenty large with metadata revisions without shoving files into it, too. Ours hit 20GB at one point because I forgot to pack it for a while. If at all possible, store the files themselves outside of Zope. 3. Use BTreeFolders if you can, to speed retrieval. This may not be an issue with Plone. 4. Investigate ZCatalogs thoroughly to make sure they meet your needs. We are about to jump ship in favor of a relational database, simply because ZCatalogs are not as well suited to our purposes. Nathaniel
I'm developing an intranet document management system with Zope/Plone and I'm concerned about the capabilities of ZODB. The system is going to store many thousands of large files, so the Data.fs is very likely to grow up to tens of GB. I'm wondering if ZODB can handle such amount of data in its standard implementation (using Data.fs). I investigated several alternatives, including ApeLib/Dbtab, but they don't seem to work out of the box with Plone (while thay do in ZMI).
Is somebody aware of **fully-tested, real-world** applications with multi-GB storage or, in other words, should I stay with ZODB or search for alternatives?
Thanks
Alessio Beltrame
On Fri, Jan 23, 2004 at 03:14:05PM -0500, nwingfield@che-llp.com wrote:
4. Investigate ZCatalogs thoroughly to make sure they meet your needs. We are about to jump ship in favor of a relational database, simply because ZCatalogs are not as well suited to our purposes.
Could you elaborate on what you're doing and how you're planning to use a relational database for item catalogging? srl -- Shane Landrum, Software Engineer srl@boston.com boston.com / NY Times Digital
Are you familiar with iTunes? If so, you will be quite familiar with what I am trying to achieve. Zope is by nature very hierarchical. I want to get away from the hierarchies to the point where you can navigate documents based on their properties only. Change the properties and the document will appear elsewhere - maybe even several places, depending on how you define the routes by which you navigate. In iTunes a route might look like Genre > Artist > Album, or it might just be Artist > Album. Wherever you are in your navigation, there is always a list of documents at your fingertips. It is very hands-on and very quick. In document management world there is the potential for many more variations on this theme. I believe we will be frequently adding and indexing new document types and properties - and this will be a large part of the functionality of the application, not a backend administrative task. Whatever database we use will have to be very nimble. Nathaniel Shane Landrum <srl@boston.com> wrote on 01/23/2004 05:18:24 PM:
Could you elaborate on what you're doing and how you're planning to use a relational database for item catalogging?
On Mon, Jan 26, 2004 at 10:14:54AM -0500, nwingfield@che-llp.com wrote:
Are you familiar with iTunes? If so, you will be quite familiar with what I am trying to achieve. Zope is by nature very hierarchical. I want to get away from the hierarchies to the point where you can navigate documents based on their properties only.
hmm. perhaps this is an impedance mismatch then. don't know.
Whatever database we use will have to be very nimble.
If I were building it in zope, I would probably consider something like CMF Topic ("canned" catalog queries). I would then test zcatalog heavily to see how its performance holds up under my expected load. If it's too slow under load, I would consider whether it is ok for my application to trade "freshness" of data for speed - there are few performance problems that can't be solved by cacheing :-) -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's THE MEGA ORIGINATOR! (random hero from isometric.spaceninja.com)
On Fri, 23 Jan 2004 15:14:05 -0500 nwingfield@che-llp.com wrote: [..]
4. Investigate ZCatalogs thoroughly to make sure they meet your needs. We are about to jump ship in favor of a relational database, simply because ZCatalogs are not as well suited to our purposes.
I'd be interested to know what the specific reasons are. I have plans about improving ZCatalog in various ways, and it's always interesting to here other outside opinions and use cases that can be used to inform future improvements. -Casey
Casey Duncan wrote at 2004-1-23 18:12 -0500:
... I'd be interested to know what the specific reasons are. I have plans about improving ZCatalog in various ways, and it's always interesting to here other outside opinions and use cases that can be used to inform future improvements.
We had extremely bulky BTrees buckets holding the meta data information. This caused huge transaction sizes (a workflow state change resulted in a transaction of about 500 kB). Of course, this was a configuration problem: "summary" and "bobobase_modification_time" were part of the catalog's MetaData and my colleagues used "summary" extensively (each summary was several kb big) ... Tim already optimized the BTrees package a lot. But, intersection may still gain from more optimizations. I used code like this: found = intersect(tree, set) where "tree" is an "OOBTree" and "set" usually had a single element (but could have more, of course). I found out, that this is often extremely slow -- much much slower than if len(set) == 1: key = set[0] if tree.has_key(key): found = set else found = OOSet() else: found = intersct(tree, set) In a fully optimized intersection, the difference should be very small. Path index searches are slow. It helped (for us) to reverse the order in which intersections are done (lower level path components tend to be more specific, leading to smaller intermediate intersection sets). Colleagues suggested to cache catalog results. I will implement that soon (however not for "ZCatalog" itself but for our "HaufeQuery" which is similar to your "CatalogQuery", just using query objects instead of query strings). "ZCatalog" should have an easy way to freely use "and", "or" "not" to combine subqueries to indexes -- similar to your "CatalogQuery" (or our "HaufeQuery"). -- Dieter
On Mon, Jan 26, 2004 at 09:51:07AM +0000, Chris Withers wrote:
Dieter Maurer wrote:
"ZCatalog" should have an easy way to freely use "and", "or" "not" to combine subqueries to indexes -- similar to your "CatalogQuery" (or our "HaufeQuery").
Yes please! Me too!
that would be awesome. It should be fast too ;-) -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's NOT SO ESOTERICALLY IMPOTENT FOOD PONTIFF! (random hero from isometric.spaceninja.com)
participants (6)
-
Casey Duncan -
Chris Withers -
Dieter Maurer -
nwingfield@che-llp.com -
Paul Winkler -
Shane Landrum