I am implementing a document Library using Zope. It has an exhaustive index with several thousand topics in an outline residing on a PostgreSQL database. This works well and I like it. My question is where is the best place to store the documents themselves? They will be static HTML documents ranging from 1-50Kb in size roughly. There will probably be at least 10,000-15,000 of these documents in the library once all is said and done. In my mind I have three options: 1. Store them on the filesystem. 2. Store them in a PgSQL table as blobs. 3. Store them as DTML Docs in the ZODB. I would like to eventually have full text searching capabilities, so that makes #1 less attractive (I would likely need my own Python method to do it). #2 is somewhat of a pain to implement due to limitations in the PgSQL row size and text searching would be slow. With #3 I could in theory use a ZCatalog to implement the searching, so that is done for me. Is ZODB up to the task of storing this quantity of objects? What problems might I run into? Is it a wise idea, could a data.fs file of this size (~3-400MB) become too easily corrupted? Should I use a separate data.fs file just to store the documents (ie using mountedFileStorage)? Or is it better to use method #1 or #2? Information from anyone with experience in this regard is greatly appreciated. -Casey Duncan caseman@mad.scientist.com
----- Original Message ----- From: "Casey Duncan" <casey.duncan@state.co.us> To: <zope@zope.org> Sent: Friday, June 16, 2000 6:35 PM Subject: [Zope] ZODB or not ZODB?
My question is where is the best place to store the documents themselves? They will be static HTML documents ranging from 1-50Kb in size roughly. There will probably be at least 10,000-15,000 of these documents in the library once all is said and done.
In my mind I have three options:
1. Store them on the filesystem. 2. Store them in a PgSQL table as blobs. 3. Store them as DTML Docs in the ZODB.
Is ZODB up to the task of storing this quantity of objects? What problems might I run into? Is it a wise idea, could a data.fs file of this size (~3-400MB) become too easily corrupted? Should I use a separate data.fs file just to store the documents (ie using mountedFileStorage)? Or is it better to use method #1 or #2? Information from anyone with experience in this regard is greatly appreciated.
There are people who have experience with giant ZODBs... some people have run into the 2GB ext2fs file size limit. My Data.fs has been around ~100MB. FileSystem Storage is really quite stable, and is not likely to get corrupted no matter what the size. If you need to store the docs in multiple drives, you can use the mountable storages to set up another file on the other disk. One thing to be aware of: 10-15K documents is too much for a single Folder. You either want to break the docs up into multiple folders, or hang on for the BTreeFolder product. One other nice thing about storing in the ZODB: it's pretty easy to make your documents automatically add themselves to the ZCatalog. No need to manually update the indexes. (This would be true of PgSQL, but not the fs.) Kevin
Casey Duncan wrote:
Is ZODB up to the task of storing this quantity of objects? What problems might I run into? Is it a wise idea, could a data.fs file of this size (~3-400MB) become too easily corrupted? Should I use a separate data.fs file just to store the documents (ie using mountedFileStorage)? Or is it better to use method #1 or #2? Information from anyone with experience in this regard is greatly appreciated.
Casey, Zope.org is 375 MB packed, and it grows by 100 MB a *day*. There are
8500 member folders. When you get this many objects in a folder, accessing the folder (though not the objects themselves) gets *slow*.
more info here: http://www.zope.org/Wikis/zope-dev/ReallyBigFolders ethan mindlace fremen Zopatista Community Liason
Casey Duncan wrote:
I am implementing a document Library using Zope. It has an exhaustive index with several thousand topics in an outline residing on a PostgreSQL database. This works well and I like it.
My question is where is the best place to store the documents themselves? They will be static HTML documents ranging from 1-50Kb in size roughly. There will probably be at least 10,000-15,000 of these documents in the library once all is said and done.
In my mind I have three options:
1. Store them on the filesystem. 2. Store them in a PgSQL table as blobs. 3. Store them as DTML Docs in the ZODB.
I would like to eventually have full text searching capabilities, so that makes #1 less attractive (I would likely need my own Python method to do it). #2 is somewhat of a pain to implement due to limitations in the PgSQL row size and text searching would be slow. With #3 I could in theory use a ZCatalog to implement the searching, so that is done for me.
In theory, you could use ZCatalog to catalog objects in the file system or in a RDBMS, providing that you can provide paths for them. I don't think anyone's done this yet. There are bound to be bumps from wjoever does it first. :)
Is ZODB up to the task of storing this quantity of objects? What problems might I run into? Is it a wise idea, could a data.fs file of this size (~3-400MB) become too easily corrupted?
No. Zope.Org varies from 300MB to close to 2GB. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
In theory, you could use ZCatalog to catalog objects in the file system or in a RDBMS, providing that you can provide paths for them. I don't think anyone's done this yet. There are bound to be bumps from wjoever does it first. :)
There's a patch to the Local File System product to allow indexing files in the file system. This will incorporated into the next version. --jfarr
On Wed, Jun 28, 2000 at 10:07:25AM -0400, Jim Fulton wrote:
Casey Duncan wrote:
Is ZODB up to the task of storing this quantity of objects? What problems might I run into? Is it a wise idea, could a data.fs file of this size (~3-400MB) become too easily corrupted?
No. Zope.Org varies from 300MB to close to 2GB.
What about adding a box somewhere in zope.org telling us the current size of the ZODB and perhaps some other stats (dunno, RAM, number of processes)? []s, |alo +---- -- Hack and Roll ( http://www.hackandroll.org ) News for, uh, whatever it is that we are. http://zope.gf.com.br/lalo mailto:lalo@hackandroll.org pgp key: http://zope.gf.com.br/lalo/pessoal/pgp Brazil of Darkness (RPG) --- http://zope.gf.com.br/BroDar
participants (6)
-
Casey Duncan -
ethan mindlace fremen -
Jim Fulton -
Jonothan Farr -
Kevin Dangoor -
Lalo Martins