Hello,

I am working on a solution that has a very high number of users and a significant amount of traffic. Using a zeo configuration we have been running into a few bottle necks while trying to improve our load testing results. The problem we see is if that put our Zeo Configuration under load that Zope does not close the connections and the application server ends up running out of connections for Apache to connect to. The connections once opened idle indefinitely. (Apache is running on a dedicated server different from our Zeo instances). We are using RR between 2 vm instances using 10 zeo clients per vm to distribute load. The bottleneck is occurring on the Apache server because it's keeping TCP connections in time_wait status.

We have completed load tests on both Windows 2003 and 2008. In 2003 we were able to adjust the registry so the OS would terminate connections after 15 seconds of idle, however in 2008 the minimum is 30 seconds. In the upper levels of testing the OS runs out of TCP connections because it can’t close them fast enough and it begins to fail requests. Ideally we don’t want to close connections forcibly via the tcp stack because Zope keeps them open. We’d hope that Zope would manage this clean up gracefully.

One option we are considering is using IIS7.5 and ARR as a replacement load balancing/rewrite method. This could allow us to check health of destinations prior to forwarding a request. It may also give us more control over closing connections at the OS level.

One other detail we think might be the issue is that Zope is not initiating the close connection event.

Does anyone have any experience or knowledge they can lend to help out?

Configuration:

Windows 2003/2008 server

Apache 2.1

Zope 2.12

MS SQL 2005

Python 2.6.6

SQLAlchemy 0.6.5

sqlalchemy version is 0.6.5

and z3c.sqlalchemy=1.4.0

zope.sqlalchemy=0.6

Jimmy Small (mallaice)