[Zope] ZODB or not ZODB?

Sun, 18 Jun 2000 09:57:44 -0400 (EDT)

> charset="iso-8859-1"
> 
> I am implementing a document Library using Zope. It has an exhaustive index
> with several thousand topics in an outline residing on a PostgreSQL
> database. This works well and I like it.
> 
> My question is where is the best place to store the documents themselves?
> They will be static HTML documents ranging from 1-50Kb in size roughly.
> There will probably be at least 10,000-15,000 of these documents in the
> library once all is said and done.
> 
> In my mind I have three options:
> 
> 1. Store them on the filesystem.
> 2. Store them in a PgSQL table as blobs.
> 3. Store them as DTML Docs in the ZODB.
> 

The filesystem, imho.  This lets you spread things out over
multiple disks and even (perhaps) multiple systems.  Worst case
you've got 50k x 15k = 750M.  Big for a ZODB (?), but no sweat
for a file system.  PgSQL blobs are not yet ready for prime time.
For one thing, I think they are all created in the same directory.
And I'm a big PgSQL fan, so this pains me to say, but it is true.
They are working on it.  See the TOAST project in the postgresql
mailing lists.

You want to spread the documents out over a couple of directories.
I've set up systems where everything had an ID and we'd split things
up via digits in the id.  I.e. document 252a8b7c is file 25/2a/7b/25218b7c.

You could even compress the files if you wanted to.

And you could use the "LocalFileSystem" (is that it?) product to
serve up the files through Zope.  You could tweak it to decompress
too.

> I would like to eventually have full text searching capabilities, so that
> makes #1 less attractive (I would likely need my own Python method to do
> it). #2 is somewhat of a pain to implement due to limitations in the PgSQL
> row size and text searching would be slow. With #3 I could in theory use a
> ZCatalog to implement the searching, so that is done for me.
> 

I'd put the full text search into PostgreSQL.  When the doc comes in,
strip out the keywords and index it.

> Is ZODB up to the task of storing this quantity of objects? What problems
> might I run into? Is it a wise idea, could a data.fs file of this size
> (~3-400MB) become too easily corrupted? Should I use a separate data.fs file
> just to store the documents (ie using mountedFileStorage)? Or is it better
> to use method #1 or #2? Information from anyone with experience in this
> regard is greatly appreciated.
> 

We implemented a system using #1.  Actually, we had lots of little documents
so we concatted and gziped them in batches of 200, keeping the filename, offset,
and length.  Turns out it was quick enought to unzip the file and pick out
the document of interest.  And batching them up kept the compression ratio
up.

System worked great, but was cancelled about a week before it was going
to go online.  ouch.

I'll let others speak to 3.  I've never had a problem with ZODB, but I've
never put 750MB in it.

-- cary

> -Casey Duncan
> caseman@mad.scientist.com
> 
> 
> --__--__--
>