[Zope] Folder with one million Documents?

Joachim Werner joe@iuveno-net.de
Mon, 28 Jan 2002 18:30:47 +0100


Hi!

> Unfortunately my documents are not static files. They are python
> classes. They contain
> fields for meta data and sometimes binary files.
> Since my documents are not files I think it is not usefull to use a
> filesystem.

But I guess the large parts are always in the binaries, and binaries can be
stored as files. Another issue is serving the objects. Will you have to
return a combination of the binary and the properties to the client? If yes,
you could still go with the caching approach that was suggested or save
pre-rendered objects to the filesystem.

 The next
> idea would be to have a database (maybe mysql, because it is fast) with
one
> table containing two colums: an ID and the pickled object.

Even if there are other opinions on this list: I can't believe that MySQL
can be THAT efficient with large files. As a matter of fact, it stores large
binaries as files. So it can't be more efficient than the file system
approach. And MySQL also will be a problem if you have a lot of concurrent
reads and writes. It is really fast with a few clients, but with many
clients PostgreSQL is supposed to skale better.

> Or you could use berkley DB for this.

The BerkeleyDB implementation of the ZODB is not really any faster AFAIK.
Just more flexible WRT building DBs that are "packless", non-undoing and the
like.

Another comment:

Why can many large docs slow down Zope? Because it will try to do things
like "objectValues", which will wake up all the children of a folderish
object. So if you can avoid these (i.e. avoid or customize the ObjectManager
API) and use BTreeFolders, many of the problems will probably go away.

I don't know about the ZCatalog (whether it is more efficient or less
efficient than doing the indexing in an RDBMS). Probably the whole thing
really has to be benchmarked properly. If we end up with the RDBMS solution
being most efficient for indexing, we might end up with a combination of all
three:

- Zope and ZODB for the "glueing" together
- RDBMS for indexing and probably storing the properties, too
- File System + Apache for serving large files

Joachim