"James W. Howe" wrote:
At what point does it make sense to store data outside of the ZODB? For example, I have a site which wants to have access to a news article archive. If the articles were stored in Zope, the ZCatalog searching would make this task almost trivial. I'm concerned about the size of the database, however.
The archives may go back several years and represent information from a weekly news publication. I have a sample archive of one "issue" which constitutes about 130k of raw text (no graphics at this point). I expect that a typical "issue" may be around 130k - 200k. If I store that information in Zope, probably as instances of a ZClass with properties to hold some information and perhaps DTMLMethods to hold the many article content, how many weeks of archive information could I reasonably store in Zope before the thing would become too big?
This archive solution is likely to be replaced in the future so I'm interested in finding a solution which is simple and quick to implement but will provide adequate performance for my users.
Any info on this subject would be appreciated.
There isn't really any size limit in ZODB, however, the underlying storage may impose some limitation. ZODB uses an open storage interface, and we provide a default storage implementation, the FileStorage. It's unclear how scalable FileStorage is. The answer probobably depends on your application. Here are some things to consider with FileStorage: - The database size is limited by the maximum size of a file in whatever underlying file system you use. Some systems limit this to 2GB, but other's don't. - There is a small fixed per-object memory cost when using FileStorage. This is due to the in-memory index that FileStorage uses to keep track of object record locations. Essentially, each object in the database consumes a Python dictionary entry with an 8-byte string and a short or long integer. This is probably on the order of 100 bytes per object. (Someone could invent a more optimized data structure that could cut this down to about 20 bytes per object.) Note that Rob is quite right in saying that the ZODB cache keeps only a fraction of your data in memory. In your application, you have large objects, but you don't have that many objects. So the second bullet doesn't really apply to you. Of course, file size limits are likely to limit many alternative approaches too. Ty Sarna's CompressedStorage, http://www.zope.org/Members/tsarna/CompressedStorage, can be used with FileStorage to reduce database size. This would probably be a big win for you. The per-object overhead in FileStorage could be addressed with more efficient indexing data structures, or with some mechanism, like dbm or Berkeley DB files to store the index. On systems that support very large files a derivative of FileStorage that used a dbm index should be able to support extremely large databases. Of course, you can implement (or you can pay someone to implement) alternative storages that used whetever underying storage mechanism you want and then plug it into ZODB. For example, it should be straightforward to implement a ZODB storage that uses some underlying RDBMS as it's storage manager. Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.