storing in ZODB vs filesystem
Hi, I need to have a document management system. That would involve a lot of document upload. Can some one advice me on whether its wise to store all the documents as zope does by default into the database or store the file on to some place in the file system. Can someone tell me how to do this easily ! If it profitable to do that ? I have a feeling that storing all the documents in the database might slow down the database! thanks, Ciji Isen
It should not be a problem to store at least ten-thousands of documents inside the ZODB. At least some popular websites running on Zope are using a ZODB based solution for all their content. -aj ----- Original Message ----- From: cijiisen To: zope@zope.org Sent: Saturday, August 24, 2002 12:53 Subject: [Zope] storing in ZODB vs filesystem Hi, I need to have a document management system. That would involve a lot of document upload. Can some one advice me on whether its wise to store all the documents as zope does by default into the database or store the file on to some place in the file system. Can someone tell me how to do this easily ! If it profitable to do that ? I have a feeling that storing all the documents in the database might slow down the database! thanks, Ciji Isen
Hi,
It should not be a problem to store at least ten-thousands of documents inside the ZODB. At least some popular websites running on Zope are using a ZODB based solution for all their content.
But doesn't that cause the size of ZODB to grow more quickly than if you were to store uploaded files on the filesystem and then just wrap meta data around them with LocalFS? My understanding is that your process space needs to have enough RAM to contain the ZODB index, which is typtically about 20% the size of Data.fs. If the index has to be paged out, performance slows to a grind. If so, storing files in ZODB would not run into problems if you've got adequate RAM and/or can afford to distribute ZODB onto it's own ZEO server. But if you've got hardware limits, it seems like it would be safer to store the files in the filesystem. I've always assumed that that's the whole point of LocalFS. Is all this plausible, or am I barking up the wrong tree? Please correct me if I'm wrong because this is an issue for an app I'm in the process of spec'ing out. Regards, Phil
Phil Glaser writes:
... My understanding is that your process space needs to have enough RAM to contain the ZODB index, which is typtically about 20% the size of Data.fs. I do not believe this number (20 %).
The index size depends only on the number of objects in the ZODB and not on the objects size. I expect (at most) a few hundred bytes per object. Dieter
On Monday 26 Aug 2002 5:45 pm, Dieter Maurer wrote:
Phil Glaser writes:
... My understanding is that your process space needs to have enough RAM to contain the ZODB index, which is typtically about 20% the size of Data.fs.
I do not believe this number (20 %).
Thats higher than I typically see, but not much. All my FileStorages have a RAM/disk ratio between 2% and 12%. I usually work with 10% when planning a new server That is for normal use. During packing FileStorage needs more than twice as much, which may explain Phils 20%
The index size depends only on the number of objects in the ZODB and not on the objects size. I expect (at most) a few hundred bytes per object.
If you are limited by RAM for your filesystem index, then Content objects are probably not your problem. If a typical content object is 10k, then they have a RAM/disk ratio of 1%. Moving content out into the filesystem wont significantly reduce RAM requirements. Your problem is probably the metadata. For example BTrees create many objects that are less than 100 bytes.
Is all this plausible, or am I barking up the wrong tree? Please correct me if I'm wrong because this is an issue for an app I'm in the process of spec'ing out.
There tools which tell you the size and number of objects in a FileStorage: I think translyzer can do this, and something else released more recently whose name I dont remember. You could use this to get a better estimate for your data set. If it looks like a problem you could consider a more scalable storage; possibly the various flavours of bsddbStorage (in a perpetual beta release), or DirectoryStorage (still at alpha release, and under active development) (Hmmm. If you use DirectoryStorage then you can see the mix of object sizes using 'ls'. Thats handy)
participants (5)
-
Andreas Jung -
cijiisen -
Dieter Maurer -
Phil Glaser -
Toby Dickenson