[ZODB-Dev] [ zodb-Bugs-547020 ] Weird ZEO error: Aiieee! error code 25
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 22 Apr 2002 12:19:24 -0700
Bugs item #547020, was opened at 2002-04-22 10:47
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=115628&aid=547020&group_id=15628
Category: None
Group: None
Status: Open
Resolution: None
Priority: 9
Submitted By: Chris Withers (fresh)
Assigned to: Jeremy Hylton (jhylton)
Summary: Weird ZEO error: Aiieee! error code 25
Initial Comment:
On a Python ZEO client, we get:
> File c:\zope\2-4-2_base\lib\python\ZEO\ClientStorage.py, line 426, in tpc_vote
> (Object: (x.x.com', xxxx))
> File c:\zope\2-4-2_base\lib\python\ZEO\zrpc.py, line 228, in __call__
> TypeError: exceptions must be strings, classes, or instances
The matching entry on the Storage Server is:
> 2002-04-22T10:33:43 ERROR(200) zdaemon zdaemon: Mon Apr 22 11:33:43 2002: Aiieee! 2107
exited with error code: 25
After that, the storage server tries to fork, and we got the following entry pattern in the logs:
> ------
> 2002-04-22T10:33:43 INFO(0) zdaemon zdaemon: Mon Apr 22 11:33:43 2002: Houston, we
have
> forked
> ------
> 2002-04-22T10:33:43 INFO(0) zdaemon zdaemon: Mon Apr 22 11:33:43 2002: Hi, I just forked
> off a kid: 2155
> ------
> 2002-04-22T10:33:43 INFO(0) zdaemon zdaemon: Mon Apr 22 11:33:43 2002: Houston, we
have
forked
..however, the storage server doesn't ever accept connections afterwards until it is manually
stopped and restarted.
During this restart process, we often get the following entries in the storage server logs:
> ------
> 2002-04-21T09:22:22 PROBLEM(100) ZODB FS FS21 warn: /x/Data.fs > truncated, possibly due
to damaged records at 2147482867
>
> ------
> 2002-04-21T09:22:22 PROBLEM(100) ZODB FS FS21 warn: Writing truncated data from >
/x/Data.fs to /x/Data.fs.tr14
>
> ------
...but regardless of whether we get those messages or not, the storage server takes an age to
start and uses 100% CPU the whole time.
This storage server has been completely stable for about a month, and this problem has started
happening reccurrently in the last few days.
What's the best way to go about find what's going on?
----------------------------------------------------------------------
>Comment By: Jeremy Hylton (jhylton)
Date: 2002-04-22 19:19
Message:
Logged In: YES
user_id=31392
I suspect it is something funky with your environment. On
my machine (Mandrake 7.2?) RLIMIT_FSIZE > 10**18.
----------------------------------------------------------------------
Comment By: Jeremy Hylton (jhylton)
Date: 2002-04-22 19:01
Message:
Logged In: YES
user_id=31392
The zdaemon error message is probably a little misleading.
I believe that 25 is the signal sent to the child causing
it to exit. If that's right, then you're getting SIGXFSZ,
which means that it tried to extend a file past the rlimit
(RLIMIT_FSIZE).
Does that sound plausible? How big is the file? Do you
have a custom RLIMIT_FSIZE or know what the default is for
your OS?
It makes sense that this happens during tpc_vote(), because
that's when all the data is copied from the tempfile to the
Data.fs. If a failure occurs at this point, I'll bet that
on restart FileStorage ignores the index and recomputes it,
which would explain why it is slow.
Not sure how we can detect this problem more gracefully.
Until today, I had never even heard of SIGXFSZ.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=115628&aid=547020&group_id=15628