Hello. I've thumbed through the list for Zope Zeo performance issues similar to mine - but have found a lot of conflicting information - so I thought I'd ask (and keep reading and experimenting). Zope - 2.7.x (Zeo) Apache 2.x Zope is using Plone (2.0.x) Basically I have a high traffic (high performance) Apache -> Zeo -> Zope(ZODB) setup. 10 Apache/Zeo boxes - and one big old Zope/ZODB box. The equipment is high end DELLs (well as of 3 years ago) - all running FreeBSD (we tried Red Hat but it blows up - but thats another post). Anywho - if your still reading thanks. We know Plone and code is a major issue - and we are working on a migration plan to 3. But it the meantime ... The traffic is heavy write traffic (I read some of Dieters posts and am testing that out as well). Once overall load hits about 100 people or so the Zeo's start dying - heavy load, slow response, python takes all CPU/Memory. Then when traffic is removed from the ZEO instance ... the system remains CPU bound by the python process ... and you have to bounce Zope(Zeo instance) and Apache to free it. The ZODB reports heavy Clients waiting ... but doesn't budge on load. So ... anyone have any suggestions. I can throw 10 more Apache/Zeo instances as it - but not sure if that's the right approach. So I guess here's my questions. 1. Is there a Zeo Client limit you can have when connecting to a Zope(Zeo Server) instance? 2. Are there any special setting to allow for 'many' Zeo clients connecting to Zeo server? 3. Are there any 'tweak' on the Zeo Client side or the Zeo Server side that I should consider? I've reviewed quite a bit on confirguration ... but nothing seems to really make a difference. 4. Anyone have any special 'high performance' tricks / tips they can point me to? I'll continue to read and research ... but any information would be helpful. Thanks -FuBuJo
FuBuJo wrote at 2008-3-14 13:31 +0000: You need to be a bit more careful in your description. For example the diagram "Apache -> Zeo -> Zope(ZODB)" is very confusing. It is very rare that Apache speaks to Zeo. The confusion between Zope and Zeo may go straight through your description such that it is often unclear whether you really mean Zope when you write Zope and Zeo when you write Zeo. More below.
... The traffic is heavy write traffic (I read some of Dieters posts and am testing that out as well). Once overall load hits about 100 people or so the Zeo's start dying
Here again, you use a wrong word: "dying" would mean that your ZEO process terminates but below to say that it gets slower.
- heavy load, slow response, python takes all CPU/Memory.
Which "python"? The "python" executing Zeo? Or the one executing Zope?
Then when traffic is removed from the ZEO instance ... the system remains CPU bound by the python process ... and you have to bounce Zope(Zeo instance) and Apache to free it.
Which system? The one running ZEO (the ZEO server) or the one running Zope?
The ZODB reports heavy Clients waiting ... but doesn't budge on load. You see this in the ZEO logfile? Then, it is ZEO which reports the waiting -- not the ZODB.
So ... anyone have any suggestions.
We are having similar problems -- I call them commit congestions. As far as we understand it by now, it is a multiple cause problem. Commit congestions can be caused on the client (=Zope) side and on the server (=ZEO) side. A client drastically increases the probability for commit congestions when he does expensive things while he helds the commit lock, i.e. during the second phase of the two phase commit protocol. We have identified three causes: * garbage collections During a garbage collection the garbage collector holds the GIL and blocks all Python activity. We found that a single generation 2 (i.e. full) garbage collection can take between 10 and 20 s. We had a bad text index implementation that caused excessive object creation and thereby lots of garbage collections. Our measure has been to drop the bad index implementation and reconfigure the garbage collector to reduce the garbage collection frequency by a factor of 1000 * "stat"s in the second commit phase. In our system, "stat"s for NFS served files could take up to 27 s. It is a complete mystery why. Local IO, too, occasionally seemed to need excessive time. This, too, is still mysterious. We may have some hints: some ranking bugs in a search engine could cause millions of IO operations within a short timeframe and may have significantly affected the Linux IO behaviour. * invalidation message reception and correspondng client cache updates during the second commit phase Other causes for commit contention come from the (Zeo) server: * "FileStorage.pack" unnecessarily holds the commit lock during large periods of the copying phase, drastically increasing the probability for commit contentions * during some pack phase (reachability analysis), access to the storage file is high volume and erratic. This drastically reduces the performance of the storage and make commit contentions likely. * other heavy use of the file system can affect the IO performance available for storage access and can increase the likelyhood for commit contentions.
I can throw 10 more Apache/Zeo instances as it - but not sure if that's the right approach.
It is not. Commit contention is a synchronization problem. It does not go away but is likely to increase when you scale your frontends up.
So I guess here's my questions.
1. Is there a Zeo Client limit you can have when connecting to a Zope(Zeo Server) instance?
There is no limit in principle -- but as you can see, lots of clients can affect performance. Invalidation message processing poses a load on the server which grows linearly with the number of clients (each client must get all invalidations). Most other Zeo load contributions are more dependent on the actual number of requested operations (reads, writes, commits) and less on the number of clients that request these operation (of cause, more clients can generated more requests).
2. Are there any special setting to allow for 'many' Zeo clients connecting to Zeo server?
Reconfigure the Python garbage collector such that it runs far less often. Get rid of components that (unnecessarily) create lots of Python objects. Check whether you do unnecessary operations during the second commit phase. Place your ZODB storage files intelligently in the file system such that other high volume IO operations do not badly affect IO on the storage. -- Dieter
I apologize that my description was so confusing. I appreciate the feedback and so would like to clarify. The diagram is how the traffic flows. So we have: Step 1 - a Load Balancer that passes traffic to Apache Step 2 - Apache which uses mod_proxy to obfuscate the URL and proxies traffic to the Zeo Client Step 3 - the Zeo Client (residing on the same physical box as Apache) that then forwards the traffic to the Zeo Server Step 4- the Zeo Server that runs on its own box and writes to the ZODB. I thought putting Apache in front of Zope was very common (using VirtualHost Monster) - guess not. As for versions: FreeBSD 5.3 Python 2.3.5 Zope 2.7 The load on the Zeo Server is minuscule. The load on the Zeo Client(s) gets very large. It's the python process on the Zeo Client that grows large and seems to become unresponsive. Possibly due to the "Transaction blocking" being reported in the Zeo Severs log. Strangely disconnecting the Zeo Client from the Server while the Zeo Client is "locked" rarely gives the ... "disconnected during transaction" error message that I expected to see more of. Hope that clears up things for anyone else who'd like to chime in. Dieter - As for your suggestions. Thanks so much! I will certainly begin to investigate and pursue each. They seem very good things to do - just in general. I appreciate the input. I'll let the list knows how everything works out. I do notice that there was no mention of any general (config file) configuration techniques (increasing threads, cache objects, etc. etc.) --- is configuration not really that big of a performance boost? Thanks again for the direction.
FuBuJo schrieb:
I apologize that my description was so confusing. I appreciate the feedback and so would like to clarify.
The diagram is how the traffic flows. So we have: Step 1 - a Load Balancer that passes traffic to Apache
Step 2 - Apache which uses mod_proxy to obfuscate the URL and proxies traffic to the Zeo Client is that so? does not apache talk to a zope server ?
robert
FuBuJo wrote at 2008-3-14 22:06 +0000:
... I thought putting Apache in front of Zope was very common (using VirtualHost Monster) - guess not.
This is common. But, usually, a ZEO client is not abbreviated as "ZEO". "ZEO" usually means the ZEO server.
... It's the python process on the Zeo Client that grows large and seems to become unresponsive. Possibly due to the "Transaction blocking" being reported in the Zeo Severs log.
Occational "transaction blocking" messages are no reason to worry about. Only, when you see it very often or the number of waiting clients is quite high, bad things are happening. -- Dieter
participants (3)
-
Dieter Maurer -
FuBuJo -
robert rottermann