Zope's scalability across multiple web servers
I am in the process of designing a web site that will handle the load of a nationally advertised web site. I am considering using Zope over Pervavsive's Tango2000 or Vignette's StoryServer as an application server. What I would like to know is if there are any easy ways to replicate the ZODB information across web servers if I am running a web farm? In other words, I plan to have multiple identical web servers sitting behind a load balancing appliance so that incoming requests can be sent to any of the web servers. Obviously, the web servers must contain the same data for this to be of any use. If I plan to run a web server and the zope engine on each "web server", how can i replicate changes to all my servers to keep them synced. Would something as simple as a recursive copy work for this? This is not something that will have to be done in real time. Something simple like a batch job would suffice. Other than the database content, the HTML stuff will remain relatively static. Please excuse my ignorance on Zope. I haven't had the chance to set it up yet and am still in a preliminary design phase. Thanks for any helpful hints regarding this matter! Aaron Bostick Exodus Communications
Aaron, See http://www.zope.org/Products/ZEO. ZEO was previously a commercial product sold by Digital Creations, due to be released with an open source license after a trial period with specific sites. "Bostick, Aaron" wrote:
I am in the process of designing a web site that will handle the load of a nationally advertised web site. I am considering using Zope over Pervavsive's Tango2000 or Vignette's StoryServer as an application server.
What I would like to know is if there are any easy ways to replicate the ZODB information across web servers if I am running a web farm? In other words, I plan to have multiple identical web servers sitting behind a load balancing appliance so that incoming requests can be sent to any of the web servers. Obviously, the web servers must contain the same data for this to be of any use. If I plan to run a web server and the zope engine on each "web server", how can i replicate changes to all my servers to keep them synced.
Would something as simple as a recursive copy work for this?
This is not something that will have to be done in real time. Something simple like a batch job would suffice. Other than the database content, the HTML stuff will remain relatively static.
Please excuse my ignorance on Zope. I haven't had the chance to set it up yet and am still in a preliminary design phase.
Thanks for any helpful hints regarding this matter!
Aaron Bostick Exodus Communications
-- Chris McDonough Digital Creations Publishers of Zope - http://www.zope.org
Depending upon whether you are creating Products in python, replication can be as simple as copying a single file from one installation to the others, since the entire zope object store exists in a single file. If you create objects as zClasses, these migrate in the same process. If you have python Products, you have to make sure that the product source code gets moved over as well. However, if you are dynamically creating objects in the ZODB (something that is strobly discouraged in a high volume write situation) through interaction with web browsers, obviously, the replication needs to be real time. IN that case ZEO is your best bet, which shares a single object store amongst multiple web servers/interpreters. Alternatively, before the announcement of the imminent arrival of open source ZEO, many of us built highly scalable sites where dynamically created objects are stored in an RDBMS (take your pick, the more scalable the better!), and only content generation objects are stored in the ZODB. This means that the ZODB only changes whenever site content changes. All of the above solutions work very well, depending upon your environment. Enjoy. --sam
"Bostick, Aaron" wrote:
I am in the process of designing a web site that will handle the load of a nationally advertised web site. I am considering using Zope over Pervavsive's Tango2000 or Vignette's StoryServer as an application server.
What I would like to know is if there are any easy ways to replicate the ZODB information across web servers if I am running a web farm? In other words, I plan to have multiple identical web servers sitting behind a load balancing appliance so that incoming requests can be sent to any of the web servers. Obviously, the web servers must contain the same data for this to be of any use. If I plan to run a web server and the zope engine on each "web server", how can i replicate changes to all my servers to keep them synced.
Would something as simple as a recursive copy work for this?
This is not something that will have to be done in real time. Something simple like a batch job would suffice. Other than the database content, the HTML stuff will remain relatively static.
Please excuse my ignorance on Zope. I haven't had the chance to set it up yet and am still in a preliminary design phase.
Thanks for any helpful hints regarding this matter!
Aaron Bostick Exodus Communications
Sam Gendler writes:
However, if you are dynamically creating objects in the ZODB (something that is strobly discouraged in a high volume write situation)
Why is that? I thought the connection overhead and maintainence of a RDBMS was a big deal? This gets brought up a lot. And I like to hear opinions. I am aware that whatever justification is given is purely about what works for you, but I'd love to hear the reasoning. All my best, Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
Writing to a database is one of the most resource intensive things you can do. My understanding is that reading is about half of what writing is. Some databases have been optimized for one or the other and a good example is LDAP which is very fast at reading, but slow at writing (thusly good at username and passwords). For ZODB there becomes a point where the complexity and overhead of the RDBMS are less than writing to the ZODB which has greater limitations in size and overall flexibility. There are also huge issues in terms of keeping multiple SQL databases synced up so there is a case to be made to spread ZEO across many webservers with one beefy DB server in the background serving them. That, combined with Zope's ability to cache creates a highly scalable solution (IMHO). I, personally, am just salivating at the chance to get my hands on ZEO. J
From: "Jason Spisak" <444@hiretechs.com> Date: Thu, 27 Apr 2000 15:25:14 GMT To: Sam Gendler <sgendler@silcom.com> Cc: zope@zope.org Subject: Re: [Zope] Zope's scalability across multiple web servers
Sam Gendler writes:
However, if you are dynamically creating objects in the ZODB (something that is strobly discouraged in a high volume write situation)
Why is that? I thought the connection overhead and maintainence of a RDBMS was a big deal? This gets brought up a lot. And I like to hear opinions. I am aware that whatever justification is given is purely about what works for you, but I'd love to hear the reasoning.
All my best,
Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544
Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
J. Atwood: Thank you for taking the time to respond.
There are also huge issues in terms of keeping multiple SQL databases synced up so there is a case to be made to spread ZEO across many webservers with one beefy DB server in the background serving them. That, combined with Zope's ability to cache creates a highly scalable solution (IMHO).
I, personally, am just salivating at the chance to get my hands on ZEO.
I too am waiting with my dinner napkin, knife and fork. Right now I have a 300MB ZODB and not having to deal with the overhead of a RDBMS (and the headaches everyone is having on the list) is very conforting. What DB are you using? Thanks again, Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
I am using PostGreSQL 6.5.3 which I am really enjoying. It is very stable, pretty fast, has tons of features and above all, open source (7.0 which is to come out in a month has a slew more). Most of the data for the site is coming from the database (ZODB = 1.1 MB) and by playing with Zope's SQL Method caching I am really able to customize each and every SQL statement as to how much to cache and for how long. This really improves performance as cached material is said to be about 10x faster (not my tests). I have done some testing and on my server (RH 6.1, 300Mhz, 256 RAM) I think I can handle roughly 1,000,000 a day so should be ok for a bit. :) I have read everything I can get my hands on about ZEO but am still unclear about a few details like setup, management and the nuts and bolts of having one ZODB over many servers. All in good time. J
From: "Jason Spisak" <444@hiretechs.com>
There are also huge issues in terms of keeping multiple SQL databases synced up so there is a case to be made to spread ZEO across many webservers with one beefy DB server in the background serving them. That, combined with Zope's ability to cache creates a highly scalable solution (IMHO).
I, personally, am just salivating at the chance to get my hands on ZEO.
I too am waiting with my dinner napkin, knife and fork. Right now I have a 300MB ZODB and not having to deal with the overhead of a RDBMS (and the headaches everyone is having on the list) is very conforting. What DB are you using?
J. Atwood:
I am using PostGreSQL 6.5.3 which I am really enjoying. That's is what I was going to use, until I decided to put everything in the ZODB.
It is very stable, pretty fast, has tons of features and above all, open source (7.0 which is to come out in a month has a slew more). Most of the data for the site is coming from the database (ZODB = 1.1 MB) and by playing with Zope's SQL Method caching I am really able to customize each and every SQL statement as to how much to cache and for how long.
I may plug you for tip when the time comes. ;0
This really improves performance as cached material is said to be about 10x faster (not my tests).
I have done some testing and on my server (RH 6.1, 300Mhz, 256 RAM) I think I can handle roughly 1,000,000 a day so should be ok for a bit. :)
Run those TV commercials and bring 'em on!
I have read everything I can get my hands on about ZEO but am still unclear about a few details like setup, management and the nuts and bolts of >having one ZODB over many servers.
I don't know either. I've jsut read their pdf. Have you read that? Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
----- Original Message ----- From: "Jason Spisak" <444@hiretechs.com> To: "Sam Gendler" <sgendler@silcom.com> Cc: <zope@zope.org> Sent: Thursday, April 27, 2000 11:25 AM Subject: Re: [Zope] Zope's scalability across multiple web servers
Sam Gendler writes:
However, if you are dynamically creating objects in the ZODB (something that is strobly discouraged in a high volume write situation)
Why is that? I thought the connection overhead and maintainence of a RDBMS was a big deal? This gets brought up a lot. And I like to hear opinions. I am aware that whatever justification is given is purely about what works for you, but I'd love to hear the reasoning.
Actually, I think the ZODB is great. You wouldn't want to use it for a high overwrite situation (like a hit counter), but if you're *adding* things ZODB seems just fine. ZODB is really easy to work with and model things in. Philip & Ty's new SheetProvider stuff will help blur the lines between the ZODB and an RDBMS, too. If you have a few properties of an object that change frequently, just stick them on a property sheet that gets pulled from an RDBMS. Kevin
On Thu, 27 Apr 2000, Kevin Dangoor wrote:
Why is that? I thought the connection overhead and maintainence of a RDBMS was a big deal? This gets brought up a lot. And I like to hear opinions.
I
am aware that whatever justification is given is purely about what works for you, but I'd love to hear the reasoning.
Actually, I think the ZODB is great. You wouldn't want to use it for a high overwrite situation (like a hit counter), but if you're *adding* things ZODB seems just fine. ZODB is really easy to work with and model things in. Philip & Ty's new SheetProvider stuff will help blur the lines between the ZODB and an RDBMS, too. If you have a few properties of an object that change frequently, just stick them on a property sheet that gets pulled from an RDBMS.
And never forget the filesystem! Allows highly efficient concurrent writes, optimized caching at the kernel level, has a nice hierarchical structure and there are many tools to work with it! In situations where you have high write rates but the write operations are decoupled to a degree that no file locking is required (for instance session info) then an FS solution can be *very* fast. Pavlos
Pavlos:
And never forget the filesystem! Allows highly efficient concurrent writes, optimized caching at the kernel level, has a nice hierarchical structure and there are many tools to work with it! In situations where you have high write rates but the write operations are decoupled...
Besides session info, what types of info would be good and bad in a FS storage?
to a degree that no file locking is required (for instance session info) then an FS solution can be *very* fast.
Eagerly, Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
On Thu, 27 Apr 2000, Jason Spisak wrote:
Besides session info, what types of info would be good and bad in a FS storage?
In realistic practical terms anything that involves customer's money is IMO BAD on a home grown FS solution (I'd rather use Oracle or something) OTOH a hit counters (and other type of counters) IMO is better at the FS level. Storing it in an RDBMS implies an extra no_hits_per_day_to_the_web_site going to the RDBMS. Also in many cases it is not that important if some hits are 'lost' in which case you can write very efficient code with minimal locking. With file systems like RaiserFS or XFS it is be possible to have huge amounts of small objects in each directory (and still be efficient) and also provide journaling support (Seems to me they are essentially DBMS). It is possible (but probably difficult to implement) to have a RaiserFS Storage backend to ZODB. A better fit than an RDBMS backend because then you can take advantage of all the powerful FS tools like quota control, selective backups etc etc. Pavlos
Pavlos:
In realistic practical terms anything that involves customer's money is IMO BAD on a home grown FS solution (I'd rather use Oracle or something)
Is it because they are not transaction safe? Or just not robust enough?
OTOH a hit counters (and other type of counters) IMO is better at the FS level. Storing it in an RDBMS implies an extra no_hits_per_day_to_the_web_site going to the RDBMS.
Truely an advantage.
Also in many cases it is not that important if some hits are 'lost' ...
Why would you loose hits? What exactly would drop?
With file systems like RaiserFS or XFS
Been reading (Linux Technology Journal most recent issue (GFS in there too)) about these, and I will be fireing up Reiser when my SuSe 6.4 CD gets here. (Support the community with $ :) I don't know squat about Filesystems on a low level, but it really seems the blend of an RDBMS and is ideal.
on these it is be possible to have huge amounts of small objects in each directory (and still be efficient) and also provide journaling support (Seems to me they are essentially DBMS). It is possible (but probably difficult to implement) to have a RaiserFS Storage backend to ZODB. A better fit than an RDBMS backend because then you can take advantage of all the powerful FS tools like quota control, selective backups etc etc.
Yup. All my best, Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
On Thu, Apr 27, 2000 at 01:43:05PM -0400, Pavlos Christoforou wrote:
With file systems like RaiserFS or XFS it is be possible to have huge amounts of small objects in each directory (and still be efficient) and also provide journaling support (Seems to me they are essentially DBMS). Warning: ReiserFS only implements (as most Journaling Filesystems) METADATA Journaling. So after a crash your directory and filestructure will be ok, but you might have inconsistent filecontent.
Andreas -- Andreas Kostyrka | andreas@mtg.co.at phone: +43/1/7070750 | phone: +43/676/4091256 MTG Handelsges.m.b.H. | fax: +43/1/7065299 Raiffeisenstr. 16/9 | 2320 Zwoelfaxing AUSTRIA http://www.euro.cauce.org/ | http://www.cauce.org/
Jason Spisak wrote:
Pavlos:
And never forget the filesystem! Allows highly efficient concurrent writes, optimized caching at the kernel level, has a nice hierarchical structure and there are many tools to work with it! In situations where you have high write rates but the write operations are decoupled...
Besides session info, what types of info would be good and bad in a FS storage?
Here's a hypothetical situation: Suppose you were implementing a Slashdot style moderation system in Zope, and that furthermore, you were allowing few people to post replies on your site, but you allow all visitors to moderate promiscuously (and anonymously). Beyond the fact that this may not be a good idea, you might want to store the postings in Zope, but the moderation score on the FS, as it is likely to get overwritten A LOT (especially by moderation bots). You run the slight risk of a given moderation being overwritten, but statistically it should even out with as many 'up' moderations being discarded accidentally as 'down' ones. And you prevent Zope from thrashing with many tiny committed transactions mucking up the 'Undo' tab. HTH, Michael Bernstein.
Kevin:
seems just fine. ZODB is really easy to work with and model things in. Philip & Ty's new SheetProvider stuff will help blur the lines between the ZODB and an RDBMS, too.
I read the RIPP document. Very smart choice to modularize the data storage. Like a filesystem/kernel kind of relationship. Is this a replacement for Propertsheets in the current Zope? I am designing an app now that of course uses the Propertysheets. Can I switch to their rack/sheet provider? Maybee I should ask them, duh. Guys? All my best, Jason Spisak CIO HireTechs.com 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
Jason Spisak wrote:
I read the RIPP document. Very smart choice to modularize the data storage. Like a filesystem/kernel kind of relationship. Is this a replacement for Propertsheets in the current Zope? I am designing an app now that of course uses the Propertysheets. Can I switch to their rack/sheet provider? Maybee I should ask them, duh. Guys?
The ZPatterns product is the code you want. It's undocumented though, so I personally couldn't figure out how to use it. -- Itamar S.T. itamars@ibm.net
participants (10)
-
Andreas Kostyrka -
Bostick, Aaron -
Chris McDonough -
Itamar Shtull-Trauring -
J. Atwood -
Jason Spisak -
Kevin Dangoor -
Michael Bernstein -
Pavlos Christoforou -
Sam Gendler