Hi! Just my 2 eurocents:
I am developing a simple DMS. Up to now I use a python product with a BTreeFolder which contains all the documents. Every document gets an ID with DateTime().millis(). There will be up to 50 users working at the same time. And in the end I will have up to 3 million documents.
Is there a better class than BTreeFolder for such mass storage?
If it is mainly large documents (like MS Office or PDF files) you are trying to manage, the fastest way of handling this is using the filesystem for storage and serving. You could do the cataloging in Zope and hold link objects to the actual files in a Zope tree (and yes, if it is MANY objects, BTrees will be a good idea). These links could also manage the metadata. For the actual file serving, you'd use Apache (or if you can, SMB via Samba in the Intranet). I did some Benchmarks of Zope's input/output performance a couple of months ago. On a rather old Solaris machine (which has great IO throughput, but rather poor CPU performance), Apache could serve files almost at "wire speed", so the Ethernet card was the bottleneck. But Zope took much longer and consumed a lot of system resources. However, there is one important caveat with using Apache + Filesystem or Samba: You'll have to make sure that the files are secured by Apache, as Zope can not protect them on the file system level with the Zope security engine. I can't really see how an RDBMS would help you with performance. You'd need something professional, like Oracle (though PostgreSQL might do the job, too), and those servers eat RAM for breakfast and like fast CPUs. Of course, if you can spen a lot of money, Oracle will be the only solution that scales onto multiple servers. Then you could have almost any performance level you need, but at a price. It's a good question whether ZEO could help. As long as you have only one main DB, certainly not. This one will always be the bottleneck for write access. But of course you could put up separate DBs on separate servers, e.g. have servers for each department. Those servers could do their own indexing, and a centralized index server could retrieve the index information from them. ZEO would also help with search requests, as the index objects will be cached in the ZEO clients. But I'd do some benchmarks first. Even if you have 50 concurrent users, you'll probably not have 50 users posting docs at the same moment. Uploading a document will certainly not be the problem. The most time-consuming task will be the online indexing, so probably you'll have to forget about it and do a delayed batch indexing at night. Joachim