Scalability questions
Hi, I am beginning the planning process for a Zope content management system that will support approximately 8,000 users. It is hard to predict the day-to-day use of the system, so I am trying to think ahead about scalability issues, and I have a number of questions. STATIC FILES One of the suggestions in the literature for improving performance is to allow Apache to serve static files. It would seem, however, that doing so completely takes away the meta data and permission management features of Zope. The LocalFS product, on the other hand, enables you to serve content from the file system and maintain meta data and apply user permissions from Zope. Is there any performance advantage with LocalFS, or is it basically the same as storing the content in Data.fs? DATA.FS LIMITATIONS If all the site's content is stored in Data.fs, I'm concerned that it would quickly grow to a size that would result in performance drag. Since I'm used to the RDBMS world, it seems odd to store all that data in one file. Is there a rule of thumb with respect to the amount of data you can put into Data.fs before performance becomes an issue? ALTERNATIVES TO DATA.FS It seems like the following alternatives to DATA.FS in its default configuration are available: ***Distribution*** This option would involve separating the server that stores the .FS file from the one(s) running Zope. You would do this with a ZeoStorageServer. A variation on this theme would be to use NAS/NFS to put the data on a separate server. ***ExternalMount*** Here you would use the ExternalMount product to store the data for selected portions of ZODB (e.g., for a specific Product) in a separate .FS file, either on the same or a separate server. Presumably this option would mitigate performance issues resulting purely from the size of Data.FS. ***BerkeleyDB or Oracle*** Oracle or Berkeley DB can be used as the storage mechanism instead of .FS. But in doing so do you loose Zope functionality? ***LocalFS*** LocalFS could be used to store large objects (like spreadsheets, PDF files, WAV files, etc.) on the file system and benefit from Zope's meta data and permissions system. In this case, it seems like the data Zope would store would be limited to the meta data and permissions data -- the entire object would not be duplicated in Data.FS. Like the ExternalMount solution, LocalFS would, it seems, alleviate performance issues related to the size of Data.fs. Have I exhausted the possibilities, or are there others? Other there other issues I should be thinking about? Thanks for your help! Regards, Philip Glaser Principal and Software Architect Sustainable Software Solutions, LLC StillSmallVoice@DirectvInternet.com www.sustainsoft.com 973-951-9522
On Monday 01 Jul 2002 10:56 pm, Phil Glaser wrote:
Hi,
I am beginning the planning process for a Zope content management system that will support approximately 8,000 users. It is hard to predict the day-to-day use of the system, so I am trying to think ahead about scalability issues, and I have a number of questions.
STATIC FILES One of the suggestions in the literature for improving performance is to allow Apache to serve static files. It would seem, however, that doing so completely takes away the meta data and permission management features of Zope.
The heart of this suggesting is using the right tool for the job. Apache is great if your web content is just a bunch of files in a directory. One similar approach to performance is to use a caching proxy in front of Zope. Either Apache/mod_proxy or Squid. The front-end proxy can take some of the load if your pages are "loosely dynamic" - that is, they dont change on *every* request.
The LocalFS product, on the other hand, enables you to serve content from the file system and maintain meta data and apply user permissions from Zope. Is there any performance advantage with LocalFS, or is it basically the same as storing the content in Data.fs?
Ill leave LocalFS considerations to someone else.
DATA.FS LIMITATIONS If all the site's content is stored in Data.fs, I'm concerned that it would quickly grow to a size that would result in performance drag.
FileStorage (the component which manages the Data.fs file) is damn fast as long as its index fits in memory. If it doesnt, it sucks. One easy approach is to use FileStorage for as long as you can. Migrate to a different storage when you need to, not before.
Since I'm used to the RDBMS world, it seems odd to store all that data in one file. Is there a rule of thumb with respect to the amount of data you can put into Data.fs before performance becomes an issue?
as always "it depends". Generally, the amount of RAM needed by FileStorage for normal use is 1/10th of its disk space; so thats 200M RAM if you have a 2G Data.fs. It needs at least the same again when packing.
ALTERNATIVES TO DATA.FS It seems like the following alternatives to DATA.FS in its default configuration are available:
***Distribution*** This option would involve separating the server that stores the .FS file from the one(s) running Zope. You would do this with a ZeoStorageServer.
Yes. you wont regret using ZEO.
A variation on this theme would be to use NAS/NFS to put the data on a separate server.
Using FileStorage over NFS is dangerous. Also, it doesnt solve the main problem; you still need all that RAM in your server machine.
***ExternalMount*** Here you would use the ExternalMount product to store the data for selected portions of ZODB (e.g., for a specific Product) in a separate .FS file, either on the same or a separate server. Presumably this option would mitigate performance issues resulting purely from the size of Data.FS.
That might get you under a Filesystem/OS 2G limit, but thats about it.
***BerkeleyDB or Oracle*** Oracle or Berkeley DB can be used as the storage mechanism instead of .FS. But in doing so do you loose Zope functionality?
No, there is no loss of Zope functionality. The main cost is extra administration overhead. If you are already an Oracle or BerkeleyDB guru, I suggest going with this. All of these Storage options have one common scalability limitation; they need RAM proportional to database size during packing. I am currently working on a new storage which doesnt, but its not yet at production quality: http://dirstorage.sourceforge.net/
Phil Glaser wrote:
[snip] ***LocalFS*** LocalFS could be used to store large objects (like spreadsheets, PDF files, WAV files, etc.) on the file system and benefit from Zope's meta data and permissions system. In this case, it seems like the data Zope would store would be limited to the meta data and permissions data -- the entire object would not be duplicated in Data.FS. Like the ExternalMount solution, LocalFS would, it seems, alleviate performance issues related to the size of Data.fs.
Have I exhausted the possibilities, or are there others?
One thing comes to mind. I don't know if it's implemented, but it should be quite easy. Run zope behind a proxy (squid or apache+mod_rewrite+mod_proxy) Use a product like localFS or external file and modify it to _not_ serve the file upon __call__, but instead to deliver a 302 redirect to the browser, pointing to an URL which gets the file from the local filesystem via apache. That way you get apaches performance and scalability for your big files and don't loose any metadata. If you need somewhat more complicated permissions on the files, take a look at apaches security api, I'm know that it's possible to make zope do the security stuff for apache, see for instance mod_auth_any, http://www.itlab.musc.edu/~nafees/mod_auth_any.html ) HTH, oliver
participants (3)
-
Oliver Bleutgen -
Phil Glaser -
Toby Dickenson