Joachim, We've tried storing tens of thousands of objects in Zope+ZODB, and it breaks down pretty badly; ZServer eats a lot of memory and the ZCatalog is just butt slow. We converted these sites to using Zope+MySQL with good results, however. The main pain in the neck is that ZServer eats a lot of memory for bigger websites (>150MB), so we can really only run a couple of websites on each server unless we give it many many GB of RAM (which is expensive for dedicated hosts). I recommend a SQL database if you have more than ten thousand objects and you want to search them. A filesystem is fine if you never need to index them. I'm not sure if this is a gripe against OODBMS in general or just Zope/ZODB, but it seems RDBMS like MySQL have been much more optimized towards handling bigger data sets, in all aspects. I actually think there should be a published "guidelines" saying when not to use ZODB to prevent people from designing themselves into a corner with solution that doesn't scale. Regards, -- Bjorn -----Original Message----- From: Casey Duncan [mailto:casey_duncan@yahoo.com] Posted At: Monday, January 28, 2002 14:21 Posted To: Zope List Conversation: [Zope] Folder with one million Documents? Subject: Re: [Zope] Folder with one million Documents? --- Joachim Werner <joe@iuveno-net.de> wrote:
Hi!
Just my 2 eurocents:
I am developing a simple DMS. Up to now I use a python product with a BTreeFolder which contains all the documents. Every document gets an ID with DateTime().millis(). There will be up to 50 users working at the same time. And in the end I will have up to 3 million documents.
Is there a better class than BTreeFolder for such mass storage?
If it is mainly large documents (like MS Office or PDF files) you are trying to manage, the fastest way of handling this is using the filesystem for storage and serving. You could do the cataloging in Zope and hold link objects to the actual files in a Zope tree (and yes, if it is MANY objects, BTrees will be a good idea). These links could also manage the metadata.
I thoroughly agree. Having developed a DMS myself, My cut-off point (which is really just an engineering intuition more than anything) was at about 5000 documents, it would be best to store them directly in the file system. Now, since the DMS I developed (DocumentLibrary) was for a target of < 5000 documents, I went for the simpler route of storing them in a BTreeFolder. What you will have to do to make an effective FS storage system, is create code that processes uploads and places them in an arbitrary hierarchy. Obviously putting 3 million documents in one FS directory will just plain fail in most FSes and at worst will perform dismally. You'll need to devise a way for the system to subdivide amongst a shallow hierarchy of dirs, something like Squid does with its cache directories. For serving the files you could use Apache, but I might be tempted to try something simpler, like micro httpd or tux or something light-weight. I agree that serving static binaries is not ZServer's strong suit. I guess that choice will depend on the frequency and size of downloads. Another thought might be to store the files in the FS and proxy them through Zope, like ExtFile does. Then put Squid in front of Zope to cache them so that they are only served the first time from Zope. Then you don't have to worry about what stuff is getting served from where. BTW: If you do set up any nifty FS storage solution, I would be interested in seeing it for future version of DocumentLibrary. Good Luck! -Casey __________________________________________________ Do You Yahoo!? Great stuff seeking new owners in Yahoo! Auctions! http://auctions.yahoo.com _______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )