[Zope] Scalability questions

Toby Dickenson tdickenson@geminidataloggers.com
Tue, 2 Jul 2002 07:40:59 +0100


On Monday 01 Jul 2002 10:56 pm, Phil Glaser wrote:
> Hi,
>
> I am beginning the planning process for a Zope content management syste=
m
> that will support approximately 8,000 users. It is hard to predict the
> day-to-day use of the system, so I am trying to think ahead about
> scalability issues, and I have a number of questions.
>
> STATIC FILES
> One of the suggestions in the literature for improving performance is t=
o
> allow Apache to serve static files. It would seem, however, that doing =
so
> completely takes away the meta data and permission management features =
of
> Zope.

The heart of this suggesting is using the right tool for the job. Apache =
is=20
great if your web content is just a bunch of files in a directory.=20

One similar approach to performance is to use a caching proxy in front of=
=20
Zope. Either Apache/mod_proxy or Squid. The front-end proxy can take some=
 of=20
the load if your pages are "loosely dynamic" - that is, they dont change =
on=20
*every* request.

> The LocalFS product, on the other hand, enables you to serve content
> from the file system and maintain meta data and apply user permissions =
from
> Zope. Is there any performance advantage with LocalFS, or is it basical=
ly
> the same as storing the content in Data.fs?

Ill leave LocalFS considerations to someone else.

> DATA.FS LIMITATIONS
> If all the site's content is stored in Data.fs, I'm concerned that it w=
ould
> quickly grow to a size that would result in performance drag.

FileStorage (the component which manages the Data.fs file) is damn fast a=
s=20
long as its index fits in memory. If it doesnt, it sucks. One easy approa=
ch=20
is to use FileStorage for as long as you can. Migrate to a different stor=
age=20
when you need to, not before.

> Since I'm
> used to the RDBMS world, it seems odd to store all that data in one fil=
e.
> Is there a rule of thumb with respect to the amount of data you can put
> into Data.fs before performance becomes an issue?

as always "it depends". Generally, the amount of RAM needed by FileStorag=
e for=20
normal use is 1/10th of its disk space; so thats 200M RAM if you have a 2=
G=20
Data.fs. It needs at least the same again when packing.

> ALTERNATIVES TO DATA.FS
> It seems like the following alternatives to DATA.FS in its default
> configuration are available:
>
> ***Distribution***
> This option would involve separating the server that stores the .FS fil=
e
> from the one(s) running Zope. You would do this with a ZeoStorageServer=
=2E

Yes. you wont regret using ZEO.

> A
> variation on this theme would be to use NAS/NFS to put the data on a
> separate server.

Using FileStorage over NFS is dangerous. Also, it doesnt solve the main=20
problem; you still need all that RAM in your server machine.

> ***ExternalMount***
> Here you would use the ExternalMount product to store the data for sele=
cted
> portions of ZODB (e.g., for a specific Product) in a separate .FS file,
> either on the same or a separate server. Presumably this option would
> mitigate performance issues resulting purely from the size of Data.FS.

That might get you under a Filesystem/OS 2G limit, but thats about it.

> ***BerkeleyDB or Oracle***
> Oracle or Berkeley DB can be used as the storage mechanism instead of .=
FS.
> But in doing so do you loose Zope functionality?

No, there is no loss of Zope functionality. The main cost is extra=20
administration overhead. If you are already an Oracle or BerkeleyDB guru,=
 I=20
suggest going with this.

All of these Storage options have one common scalability limitation; they=
 need=20
RAM proportional to database size during packing. I am currently working =
on a=20
new storage which doesnt, but its not yet at production quality:
http://dirstorage.sourceforge.net/