[ZODB-Dev] ZEO2 - Zope Reliablity

Fri, 22 Nov 2002 18:28:11 -0800

My random thoughts on scalability/reliability issues:

Performance: 
- Because python runs best if confined to a single CPU, if you are running
one server that has 2 CPUs you may want to investigate what CPU affinity
options the RedHat kernel (you likely want to download the Advanced Server
errata kernel, which should have a scheduler geared to do this)- I'm not
sure, however, what userspace utilities there are to take advantage of this
(at least with RH 7.3, it was just a system call in the kernel for software
that could specifically take advantage of it, like Oracle; it might be
possible that this is now able to be set in /proc writes in the newest rh
kernel, but I haven't checked this out yet).  
- You would likely be able to (even without CPU affinity) get pretty good
performance running a ZEO storage server process and 1-2 Zope instances
(bound to different ports/interfaces) all on this one box, provided you have
the ram for it (2GB is likely okay).  This, of course, assumes you have an
application/site that is cacheable at the page level with a decent HIT
ratio, and that you set up a Squid (or Apache, Squid preferred) proxy on
_another_ box in front of this.
- Consider running Postgres on another box.

Reliability: 
- I've learned this one the hard way (and have generally speaking, decided
on this fix).  Don't run intensive automated housekeeping operations (cron'd
content building, reindexing) in your Zope process.  Do this out-of-process
so it doesn't chew through memory.  Ideally, do this on your second box with
a ZEO client attached to the first.  Create a python script on the
filesystem that is called by cron, and instantiates a Zope.app() to get
access to the ClientStorage that is your object database. Example:
	import Zope #you want ZEO 4 this to work
	app= Zope.app()
	my_cmf_site = app['MySite']
	uf = app['acl_users']
	zopeuser = uf.getUser('someuser').__of__(uf) 
	#user obj wrapped in aq context of uf
	newSecurityManager(None,zopeuser)
	#new sec mgr 4 thread to help invokeFactory run
	my_cmf_site.invokeFactory(id='Foo', \
		type_name='Document')
	get_transaction().commit()
- Consider running another Zope process on another box (your backup) for
manual heavy lifting.
- Settle on a strategy for load-balancing ZEO clients.  Mine: squid+icp:
removes dead 'cache peers' (which are really just web/app servers, not
caches).
- I've recently seen some weirdness with ZEO2 ClientCache corruption with
persistent caching (UnpicklingError).  I'm sure I triggered a weird bug, but
I can't duplicate it (yet).  If you encounter weirdness with persistent
caching, use a transient cache.  I don't think there is a big penalty for
this???

Scaling:
- Filestorage is fast, but you might get better scalability from
DirectoryStorage for a big ODB.  Packing a DirectoryStorage is a lot slower,
but doesn't increase the memory footprint of Zope and/or the ZSS.  Berkeley
DB storage may also scale better, though I am not familiar with it.
- If you use filestorage and Zope from source, make sure you Python has LFS
support compiled in.
- Use BTreeFolder2 product and its CMF type (included) for a folder in Plone
with lots of items.  Tweak plone skin methods to do batching if you plan to
use this.
- Increase your ZODB caching to a higher number than the default (8000 has
been suggested on the Plone/CMF lists, I think), as this will help Plone,
which does a lot of UI magic on the server, which can benefit from better
object caching.

Availability:
- If you can replicate your ZSS, you can configure your ZEO clients to use
multiple ZSS boxes.  There are low-tech options like rsync, network RAID &
network/distributed block devices,  and more reliable options like
incremental backup using DirectoryStorage's snapshot mode (which requires
some scripting and testing, but should be both fast and free), or Zope
Corp's commercial replication product, Zope Replication Server.
- Consider using IP takeover software as a means of killing a half-dead ZSS
by taking over its IP with gratuitous ARP, and potentially killing its power
for data-fencing.  Heartbeat from the Linux-HA project is a good option
(this is what I use), and its STONITH module will work with a serial
power-switch to kill the other node's power.
- Keep two copies of what you replicate, and ensure your secondary node
starts up its ZSS process (which happens at takeover time) using the
last-known-verified-good snapshot of your storage.  It doesn't help if you
replicate a Data.fs if it needs to be cleaned up before starting your backup
ZSS.  You may need to write scripts to take the most recent incremental
backup and clone it to a new location on the filesystem if it checks as
good; run this on a cron job and only use the verified good copy.

Backup:
- Incremental backups are easy with DirectoryStorage - just use the backup
script and cpio ;)
- Consider going the extra mile to serialize your content in your CMF/plone
site into system-neutral text or binary formats, for archival purposes.

-----Original Message-----
From: Nick Pavlica [mailto:nick.pavlica@echostar.com]
Sent: Friday, November 22, 2002 3:54 PM
To: zodb-dev@zope.org
Subject: [ZODB-Dev] ZEO2 - Zope Reliablity

All,

    "I apologize if this is the wrong list to post this question."  

  I'm developing a new infrastructure for our corporate Intranet, and a
number 
of other critical web applications.  I have been working with ZOPE, and love

all that it has to offer.  However,  we are in a 24x7 production facility
and 
require a high level of uptime.  Because ZOPE is a new technology for our 
organization, it has taken allot of work getting my management to agree to 
use it.   Because of this,  I'm trying to do things right the first time,
and 
build a good name for ZOPE.

  The application server will only see around 1 Million hits per month on
one 
of two  servers (primary, backup) with dual 1.8GHZ XEONs, 2GB Ram, 10k SCSI 
Raid, Dual PS, ETC.  The base software will be Zope 2.6,  Plone 1, pgsql db 
product, Redhat 8.0(EXT3), Apache 2.0.x, Postgresql 7.2.x.  My primary 
concern is reliability/uptime and have read on the zope.org site about a 
number of projects that attempt to address ZODB replication/synchronization,

but none of them seem to be at a production level.  Does ZEO2 allow ZODB 
replication ?  What are the known limitations of Data.fs ?  I have read
about 
successful installations running 30 + GB with no problem.  I have been 
thinking of some solutions to help ensure that the Data.fs is available for
a 
high level of uptime.  Please keep in mind that I have never used ZOPE in 
this type of installation, so I apologize ,again, if these are ridiculous 
solutions.

1)  Configure the primary and backup servers with ZEO2, and use the opposing

server as backup ZODB stores, by configuring the clients on each server to 
use both servers db.

2)  Use rsync to sync the data.fs file between the two servers.

3)  Use a script to tar and copy the production data.fs to the backup server

on a fairly tight schedule.

I would greatly appreciate your advice.

Thanks!

-- 
Nick Pavlica
EchoStar Communications
CAS-Engineering
(307)633-5237

_______________________________________________
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://lists.zope.org/mailman/listinfo/zodb-dev