urgent stability problem on production site
Hi, the ZPatterns list seems to be misconfigured - I cannot reach the list or someone at eby-sarna.com. So I post my problem here: Today we got online with a customer site and unfortunately instantly got a problem with ZPatterns or TransactionAgents, I don't really know. The site is quite heavily used - 40000 users that got a registration this night try to change their username and password now. All works very well and fast. The problem is that the Zope process dies once in a while - a while varies from 20 seconds to 20 minutes. I saw a relationship between login tries (login form) and the Zope process dying. Unfortunately this happens randomly - always when Zope dies shortly before that somebody tried a login. At the other side many logins work without Zope dying. The error log shows: 2002-01-30 11:02:25.627043500 2002-01-30T10:02:25 INFO(0) Z2 CONFLICT Competing writes at, /galileo/ssl/l ogin/ 2002-01-30 11:02:25.627844500 Traceback (innermost last): 2002-01-30 11:02:25.627849500 File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 175, in publi sh 2002-01-30 11:02:25.627852500 File /web/Zope-2.4.3/lib/python/Zope/__init__.py, line 240, in commit 2002-01-30 11:02:25.627855500 File /web/Zope-2.4.3/lib/python/Products/TransactionAgents/__init__.py, l ine 54, in new_commit 2002-01-30 11:02:25.627858500 File /web/Zope-2.4.3/lib/python/ZODB/Transaction.py, line 302, in commit 2002-01-30 11:02:25.627861500 File /web/Zope-2.4.3/lib/python/ZODB/Connection.py, line 324, in commit 2002-01-30 11:02:25.627889500 ConflictError: '\x00\x00\x00\x00\x00\x00\x00\x02' 2002-01-30 11:02:25.627891500 2002-01-30 11:02:25.627892500 2002-01-30 11:02:26.203482500 ------ 2002-01-30 11:02:26.203488500 2002-01-30T10:02:26 INFO(0) Z2 CONFLICT Competing writes at, /galileo/galil eo_press/ 2002-01-30 11:02:26.204217500 Traceback (innermost last): 2002-01-30 11:02:26.204220500 File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 175, in publi sh 2002-01-30 11:02:26.204223500 File /web/Zope-2.4.3/lib/python/Zope/__init__.py, line 240, in commit 2002-01-30 11:02:26.204226500 File /web/Zope-2.4.3/lib/python/Products/TransactionAgents/__init__.py, l ine 54, in new_commit 2002-01-30 11:02:26.204230500 File /web/Zope-2.4.3/lib/python/ZODB/Transaction.py, line 302, in commit 2002-01-30 11:02:26.204233500 File /web/Zope-2.4.3/lib/python/ZODB/Connection.py, line 324, in commit 2002-01-30 11:02:26.204346500 ConflictError: '\x00\x00\x00\x00\x00\x00\x00\x02' 2002-01-30 11:02:26.204349500 2002-01-30 11:02:26.204350500 2002-01-30 11:02:37.190244500 ------ 2002-01-30 11:02:37.190248500 2002-01-30T10:02:37 INFO(0) Z2 CONFLICT Competing writes at, /galileo/ssl/s uche 2002-01-30 11:02:37.190822500 Traceback (innermost last): 2002-01-30 11:02:37.190824500 File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 175, in publi sh 2002-01-30 11:02:37.190827500 File /web/Zope-2.4.3/lib/python/Zope/__init__.py, line 240, in commit 2002-01-30 11:02:37.190830500 File /web/Zope-2.4.3/lib/python/Products/TransactionAgents/__init__.py, l ine 54, in new_commit 2002-01-30 11:02:37.190833500 File /web/Zope-2.4.3/lib/python/ZODB/Transaction.py, line 302, in commit 2002-01-30 11:02:37.190836500 File /web/Zope-2.4.3/lib/python/ZODB/Connection.py, line 324, in commit 2002-01-30 11:02:37.190851500 ConflictError: '\x00\x00\x00\x00\x00\x00\x00\x00' 2002-01-30 11:02:37.190853500 2002-01-30 11:02:37.190854500 2002-01-30 11:03:39.067913500 ------ 2002-01-30 11:03:39.067920500 2002-01-30T10:03:39 INFO(0) Z2 CONFLICT Competing writes at, /galileo/ssl/s hop/warenkorb/ 2002-01-30 11:03:39.070648500 Traceback (innermost last): 2002-01-30 11:03:39.070650500 File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 175, in publi sh 2002-01-30 11:03:39.070653500 File /web/Zope-2.4.3/lib/python/Zope/__init__.py, line 240, in commit 2002-01-30 11:03:39.070656500 File /web/Zope-2.4.3/lib/python/Products/TransactionAgents/__init__.py, l ine 54, in new_commit 2002-01-30 11:03:39.070660500 File /web/Zope-2.4.3/lib/python/ZODB/Transaction.py, line 302, in commit 2002-01-30 11:03:39.070663500 File /web/Zope-2.4.3/lib/python/ZODB/Connection.py, line 324, in commit 2002-01-30 11:03:39.070681500 ConflictError: '\x00\x00\x00\x00\x00\x00\x00\n' 2002-01-30 11:03:39.070683500 2002-01-30 11:03:39.070684500 !tai 2002-01-30 11:04:11.248404500 ------ 2002-01-30 11:04:11.248411500 2002-01-30T10:04:11 INFO(0) Z2 CONFLICT Competing writes at, /galileo/galil eo_computing/ 2002-01-30 11:04:11.323482500 Traceback (innermost last): 2002-01-30 11:04:11.323488500 File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 175, in publi sh 2002-01-30 11:04:11.323492500 File /web/Zope-2.4.3/lib/python/Zope/__init__.py, line 240, in commit 2002-01-30 11:04:11.323495500 File /web/Zope-2.4.3/lib/python/Products/TransactionAgents/__init__.py, l ine 54, in new_commit 2002-01-30 11:04:11.323498500 File /web/Zope-2.4.3/lib/python/ZODB/Transaction.py, line 302, in commit 2002-01-30 11:04:11.323501500 File /web/Zope-2.4.3/lib/python/ZODB/Connection.py, line 420, in commit 2002-01-30 11:04:11.323554500 (Info: (('BTrees.OOBTree', 'OOBTree'), '\x00\x00\x00\x00\x00\x00\x00\x0 3', '')) 2002-01-30 11:04:11.323557500 File /web/Zope-2.4.3/lib/python/Products/CoreSessionTracking/SessionStora ge.py, line 186, in store 2002-01-30 11:04:11.323561500 (Object: SessionStorage) 2002-01-30 11:04:11.323562500 ConflictError: ('\x03B`|"\xd2\x85\xbb', '\x03B`|+"\xdf*') We use LoginManager with some modifications made by Joachim Schmitz. The LoginManager is the only thing that uses ZPatterns anymore - so I think there lies the source of the problem. I don't know why TransActionAgents try to do something with the ZODB - all authenticatios are done against a MySQL database. Maybe Joachim may assist to explain the changes made to LoginManager. Here are our oftware versions: Zope 2.4.3 TransactionAgents-0.0.4 LoginManager-0-8-8b1 (patched) for database connectivity: MySQL-python-0.9.1 ZMySQLDA-2.0.8 Python 2.1.1 (#1, Aug 29 2001, 15:06:31) [GCC 2.95.4 20010810 (Debian prerelease)] on linux2 + CoreSessionTracking 0.9 Here is a traceback from userland: Traceback (innermost last): File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 223, in publish_module File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 187, in publish File /web/Zope-2.4.3/lib/python/Zope/__init__.py, line 226, in zpublisher_exception_hook (Object: login) File /web/Zope-2.4.3/lib/python/ZPublisher/Publish.py, line 162, in publish File /web/Zope-2.4.3/lib/python/ZPublisher/BaseRequest.py, line 450, in traverse File /web/Zope-2.4.3/lib/python/Products/LoginManager/LoginManager.py, line 236, in validate (Object: acl_users) File /web/Zope-2.4.3/lib/python/Products/LoginManager/LoginMethods.py, line 129, in findLogin (Object: Session_USER Login) File /web/Zope- 2.4.3/lib/python/Products/CoreSessionTracking/SessionDataManager.py, line 258, in getSessionData (Object: sdm) File /web/Zope- 2.4.3/lib/python/Products/CoreSessionTracking/SessionDataManager.py, line 390, in _getSessionDataObject (Object: sdm) File /web/Zope- 2.4.3/lib/python/Products/CoreSessionTracking/LowConflictConnection.py, line 107, in setstate File /web/Zope- 2.4.3/lib/python/Products/CoreSessionTracking/SessionStorage.py, line 156, in load (Object: SessionStorage) KeyError: Does anybody have a hint how to solve this problem? Regards, Frank -- CTO fte@Lightwerk.com http://www.Lightwerk.com/ Fax: +49-2434-80 07 94 Phone: +49-2434-80 07 81 Lightwerk GmbH * An der Kull 11 * 41844 Wegberg * Germany Besuchen Sie uns auf der CeBIT: Halle 6, Stand F68 / 595
Frank Tegtmeyer wrote:
Hi,
the ZPatterns list seems to be misconfigured - I cannot reach the list or someone at eby-sarna.com. So I post my problem here:
Today we got online with a customer site and unfortunately instantly got a problem with ZPatterns or TransactionAgents, I don't really know.
The site is quite heavily used - 40000 users that got a registration this night try to change their username and password now. All works very well and fast.
The problem is that the Zope process dies once in a while - a while varies from 20 seconds to 20 minutes. I saw a relationship between login tries (login form) and the Zope process dying. Unfortunately this happens randomly - always when Zope dies shortly before that somebody tried a login. At the other side many logins work without Zope dying.
Frank, try starting Zope up single-threaded ( -t 1 ) and see if that helps; I'm nervous about a potential problem with the MySQLDA C libraries having thread-local data and switching threads leading to instability. We've had one customer who 'made the problems go away' when they ran single-threaded Zope.
Hi! My first guess when I look at the traceback is that the problem actually seems to be Core Session Tracking:
2002-01-30 11:04:11.323557500 File /web/Zope-2.4.3/lib/python/Products/CoreSessionTracking/SessionStora ge.py, line 186, in store 2002-01-30 11:04:11.323561500 (Object: SessionStorage) 2002-01-30 11:04:11.323562500 ConflictError: ('\x03B`|"\xd2\x85\xbb', '\x03B`|+"\xdf*')
A conflict error on its own just shows that you have a lot of load to handle. So this should not kill the Zope process. But in my experience, CST never worked without problems. Load errors and conflicts happened rather frequently. I'd suggest upgrading to 2.5 and the new session stuff soon, though this might also cause new problems on other parts of the site, and certainly can not be done over night ... Joachim
A conflict error on its own just shows that you have a lot of load to handle. So this should not kill the Zope process. But in my experience, CST never worked without problems. Load errors and conflicts happened rather frequently. I'd suggest upgrading to 2.5 and the new session stuff soon, though this might also cause new problems on other parts of the site, and certainly can not be done over night ...
Note that as far as CST (or ZODB in general) goes, load errors are usually bugs but conflict errors are normal. Since Frank is able to run this thing in single-threaded mode and it works ok (it doesn't segfault), the problem is probably with a C extension somewhere that is not threadsafe. CST is all Python code, so I doubt it's that (unless it's calling a library which is not threadsafe, but it uses only the stock Python libraries which we know are). I suspect that it's either some piece of C code in Zope proper or Frank's database adapter. - C
Frank Tegtmeyer writes:
... Here are our oftware versions:
Zope 2.4.3 TransactionAgents-0.0.4 LoginManager-0-8-8b1 (patched)
for database connectivity: MySQL-python-0.9.1 ZMySQLDA-2.0.8
Python 2.1.1 (#1, Aug 29 2001, 15:06:31) [GCC 2.95.4 20010810 (Debian prerelease)] on linux2
You do not read the mailing lists? You should! There are serious stability problems with Zope 2.4.x and Python 2.1.1. You should consider one of: * upgrade to Python 2.1.2 and Zope 2.4.4 beta or Zope 2.5 It might be necessary to force Zope also to use Python implemented access control instead of the mode optimizate C implemented access control * configure Python 2.1.1 with "--without-pymalloc --without-cyclic-gc" This does not fix the problem, but works at the symptoms. Your Zope may run a bit longer. Details in dozens of posts to the mailing lists... Dieter
-> * upgrade to Python 2.1.2 and Zope 2.4.4 beta or Zope 2.5 RPMs of Zope 2.4.4beta or Zope 2.5 would be very nice. Believe it or not, some of us actually use the administrative features of RPMs. It would be nice if the RPMs were generated in an automatic way, instead of just as some guy's pet project that gets updated when he has time. Furthermore, it would be nice if the RPMs were tested on systems other than Red Hat (for me, that would be Mandrake, in particular). I have to use --nodeps (and --force? I don't remember) to install because it looks for a package that is named differently under Mandrake... I can provide details if anyone is interested. --Derek
Derek Simkowiak wrote:
-> * upgrade to Python 2.1.2 and Zope 2.4.4 beta or Zope 2.5
RPMs of Zope 2.4.4beta or Zope 2.5 would be very nice.
Believe it or not, some of us actually use the administrative features of RPMs. It would be nice if the RPMs were generated in an automatic way, instead of just as some guy's pet project that gets updated when he has time.
Furthermore, it would be nice if the RPMs were tested on systems other than Red Hat (for me, that would be Mandrake, in particular). I have to use --nodeps (and --force? I don't remember) to install because it looks for a package that is named differently under Mandrake... I can provide details if anyone is interested.
--Derek
We would be delighted if you spent the time and developed automatic packaging for the distributions which concern you. Our build process does not generate RPMs or any other package format. -- Matt Kromer Zope Corporation http://www.zope.com/
Matthew T. Kromer wrote:
Derek Simkowiak wrote:
-> * upgrade to Python 2.1.2 and Zope 2.4.4 beta or Zope 2.5
RPMs of Zope 2.4.4beta or Zope 2.5 would be very nice.
Believe it or not, some of us actually use the administrative features of RPMs. It would be nice if the RPMs were generated in an automatic way, instead of just as some guy's pet project that gets updated when he has time.
Furthermore, it would be nice if the RPMs were tested on systems other than Red Hat (for me, that would be Mandrake, in particular). I have to use --nodeps (and --force? I don't remember) to install because it looks for a package that is named differently under Mandrake... I can provide details if anyone is interested.
We would be delighted if you spent the time and developed automatic packaging for the distributions which concern you. Our build process does not generate RPMs or any other package format.
FYI the "automatic packaging" part is already done. "rpm -bb" takes a .spec file, unpacks the sources in a temporary folder, performs a complete build, and generates RPMs. The only thing someone needs to maintain is the .spec file, which might be slightly different per distribution. Shane
participants (7)
-
Chris McDonough -
Derek Simkowiak -
Dieter Maurer -
Frank Tegtmeyer -
Joachim Werner -
Matthew T. Kromer -
Shane Hathaway