Re: [Zope] Zope Database Size

3 Feb 2000

      "James W. Howe" wrote:
...
At what point does it make sense to store data outside of the ZODB?  For
example, I have a site which wants to have access to a news article
archive.  If the articles were stored in Zope, the ZCatalog searching would
make this task almost trivial.  I'm concerned about the size of the
database, however.
The archives may go back several years and represent information from a
weekly news publication.  I have a sample archive of one "issue" which
constitutes about 130k of raw text (no graphics at this point).  I expect
that a typical "issue" may be around 130k - 200k.  If I store that
information in Zope, probably as instances of a ZClass with properties to
hold some information and perhaps DTMLMethods to hold the many article
content, how many weeks of archive information could I reasonably store in
Zope before the thing would become too big?
This archive solution is likely to be replaced in the future so I'm
interested in finding a solution which is simple and quick to implement but
will provide adequate performance for my users.
Any info on this subject would be appreciated.
There isn't really any size limit in ZODB, however, the underlying 
storage may impose some limitation. ZODB uses an open storage 
interface, and we provide a default storage implementation, the 
FileStorage.  It's unclear how scalable FileStorage is.  The answer
probobably depends on your application.

Here are some things to consider with FileStorage:

  - The database size is limited by the maximum size of
    a file in whatever underlying file system you use. 
    Some systems limit this to 2GB, but other's don't.

  - There is a small fixed per-object memory cost when 
    using FileStorage.  This is due to the in-memory 
    index that FileStorage uses to keep track of object
    record locations. Essentially, each object in the database
    consumes a Python dictionary entry with an 8-byte string and
    a short or long integer. This is probably on the order of
    100 bytes per object.  (Someone could invent a more 
    optimized data structure that could cut this down to
    about 20 bytes per object.)

    Note that Rob is quite right in saying that the ZODB cache
    keeps only a fraction of your data in memory.

In your application, you have large objects, but you don't have 
that many objects.  So the second bullet doesn't really apply
to you.  Of course, file size limits are likely to limit
many alternative approaches too.

Ty Sarna's CompressedStorage, 
http://www.zope.org/Members/tsarna/CompressedStorage,
can be used with FileStorage to reduce database size.
This would probably be a big win for you.

The per-object overhead in FileStorage could be addressed with
more efficient indexing data structures, or with some mechanism, 
like dbm or Berkeley DB files to store the index.

On systems that support very large files a derivative
of FileStorage that used a dbm index should be able to 
support extremely large databases.

Of course, you can implement (or you can pay someone to
implement) alternative storages that used whetever underying
storage mechanism you want and then plug it into ZODB.
For example, it should be straightforward to implement a
ZODB storage that uses some underlying RDBMS as it's storage
manager.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.