[ZODB-Dev] Real ZODB troubles

Eugene el-spam at yandex.ru
Fri Jul 2 05:58:06 EDT 2004


Hello Zodb-dev,

Tim Peters asked about zodb troubles. I have the real one, not one - many.
We are using FreeBSD 5.2, Zope 2.7.0 and Plone 2.0.2 on our production server.
There's also some "classical" apache sites.
Some time ago I met serious troubles with zodb.
The scenario is the next:
First I lost ability to change my documents and add new ones - in this
case I got an error, reading worked well.
After some time (about an hour) each read attemp finished with error.
I only got traseback on my screen.
Plone logs got a lot of messages CorruptedDataError.
I tried to pack the BD, but it didn't give result.
Attepmt to stop and start zop server failed. I run it from the
console: loading stopped with traceback like:

ZODB.POSException.ConflictError: database conflict error
(oid 0000000000000001, serial was 035588c40d2b9299, now
0000000000000000).

I resored db from backup, because project at that moment was not in
production state, backups were not regular, so I took 7day age backup
and need to restore data manually.
After a week the problem comes back in the same way.
So I have to figure out a bad moment:
-all this time trouble lived in the DB
-testing utilities didn't find it.

The next time, data was exported into the .zexp file.
DB reacreated from scratch and data was imported back.
I home this method export-import makes zope to look through all the data and
put them in correct way.

We are using certified server based on P4 platform.
In the same server also placed apache hosting, mail, databases. All
works fine, no one of this programs lost it's data.
Disk was checked for the erors - not found, all ok.

Really it's a VERY BIG problem.
Bacause I cannot know about the reasons of it, so I can't predict
troubles, and I have to persistently manually monitor server state.

-- 
Best regards,
 Eugene                          mailto:el-spam at yandex.ru


===========================

[follow-ups to zodb-dev at zope.org, please]

We (Zope Corp) occasionally get reports of FileStorage (Data.fs) corruption
of kinds that aren't understood, and that are never seen in Zope Corp's own
Zope deployments, neither in extreme artificial stress tests.  We take
corruption very seriously, and there are no known bugs in the current
releases of ZODB that can cause corruption.  Therefore we're very keen to
investigate cases that may still exist.

By "corruption", I mean what corruption conventionally means for any file:
the Data.fs and/or Data.fs.index file is damaged at the byte level, as if
someone had overwritten some region (or regions) with nonsense bytes.  Of
course this can be a disaster when it occurs.  Visible symptoms may include:

+ FileStorage.py raises CorruptedDataError.

+ FileStorage.py passes on this exception from Python's struct module:

    error: unpack str size does not match format

The only currently known causes for these are hardware problems (bad disk,
bad disk controller, loose connection, flaky memory chip) and equivalently
fatal system software bugs (buggy system disk driver, buggy system I/O
libraries, buggy third-party software).

If you experience corruption not due to such causes beyond ZODB's
control(*), we want to hear about it!  The best place to report a case is on
the Zope Collector:

    http://collector.zope.org/Zope

with topic "Database".  Because database files can be very large, and may
contain sensitive data, please don't attach them to the report; you should
be willing to let us get copies of them privately, though.


(*)
How do you know whether the cause is beyond ZODB's control?  You probably
can't, and that's fine.  A case where you can:  the last time Jim Fulton
tried to track one of these down, the customer's system was in such bad
shape that they were unable to make a readable tar file containing their
database files:  attempts to do so created corrupt *tar* files.  If I/O on
your system is plain broken, then no, FileStorage isn't going to work
either.




More information about the ZODB-Dev mailing list