[Zope] Recovering corrupt XML Export files?

Terry Hancock hancock@anansispaceworks.com
Sun, 25 Aug 2002 02:35:45 -0700


Dieter Maurer wrote:
> Terry Hancock writes:
>  > Anyway, thinking that something invisible was going
>  > on in the site, I tried exporting and importing a large
>  > part of the site.
>  >
>  > The export seemed to work, but the import failed, ...
>  > Unfortunately, I hadn't backed up the Data.fs
>  > right before doing this (I was unfortunately relying on
>  > the export to work -- I know ... I should've known better).
>  > ....
>  > Anyway, I was hoping to try to recover this XML export
>  > file.
> What a route...
> 
>    I expect, you deleted the object you wanted to recreate
>    through the import?
> 
>    Undo the deletion and you should have your old state back.
> 
>    Backup, then do any experiment your want ;-)

Heh. Well, unless one is dumb enough to pack the
database in between.  You have to realize that I
was trying to flush out any "crud" remaining in
the database. This didn't work, because the XML still
contained the "crud".  Specifically, I have dangling
references of some kind to a product I've removed
already.  The thing is, it's not so easy to diagnose
problems in the ZODB as it is on a filesystem. On
a regular filesystem, there are plenty of simple
tools which allow you to find out *exactly* what you
have.

I was kind of hoping that illegal stuff would get
dropped during the export, instead of simply blocking
the import, which is what actually happened.

I wish the XML export were more intuitively arranged
(for example, following the object-tree's structure).
It appears, from my experience with it, to be more
nearly chronologically arranged, which seems odd to
me.  I guess I would've expected that an XML dump of
a high level folder should contain a contiguous block
for each of its subfolders, which would be identical
to XML exporting those subfolders individually. But,
my impression is that this isn't true. Scanning the
file seemed more like scanning physical sectors on a
disk -- the data is somewhat disjointed.

It's somewhat academic now -- I was able to get a backup
of Data.fs from a few days ago (though this means that
the corruption is still there, whatever is going on). I
have a suspiscion that it has something to do with
defining local roles in a domain controlled by CookieUserFolder.
This must somehow create references to the CUF from the
area with the local role (?).  The site has a mixture
of regular and CUF folders, and the original symptom
was that we were seeing CUF challenges for folders that
should've been controlled by the regular UF.

I have to pull a few new files out the XML, but the
images are backed up elsewhere, and it's possible to
search for and extract the text parts I need, even
without understanding the structure of the XML export.

In any case, it would be nice to have a procedure
which did what I originally wanted -- i.e. export
object content and meta-data into a readable format
which is guaranteed to be legal and then import
what is then a clean collection of objects. If one
remains restricted to the right types, this can be
done with FTP, but it doesn't work in general.

I think you have to realize that it is a definite
fault of Zope that you can't round-trip with the
export/import options. After all, this is the
primary way to freeze Zope content, without losing
meta-data. So it ought to work very robustly. But
it didn't -- a collection of objects which did
exist successfully in Zope was saved, and could
not be restored to its former state. Sure, I wasn't
being as careful as possible -- but it *should*
have worked.  At worst I should've had localized
failures -- individual objects that wouldn't
import -- not a blanket failure of the whole tree.
This is one situation were "transaction" thinking
is *not* appropriate.

I guess it's just that although the ZODB is "very
robust" by application standards, it isn't really
up to the same standards we judge operating systems
by -- but if we're using it like one, then it
ought to be.

The thing is, there isn't really any safe way to
backup your Zope data: backing up Data.fs doesn't
help if the data is logically corrupted in some
way. And this example shows that exports aren't that
safe either -- they can also be corrupt, and thus
refuse to load at all.  This alone is an argument
for not storing anything valuable within Zope -- a
single point failure or operational problem (bugs
in a product or a misconfigured product) can cause
total data loss. That's the opposite of "robust",
which most filesystems are -- it's very difficult
to toast a server's disk filesystem this badly --
you can lose pieces of it, but it's rare to have
the whole thing go (and even when that happens,
it's usually a hardware failure).  Also, it's no
problem to make clean backups -- saving the logical
structure, not the inode or FAT tables and what
not.  And if I try to un-tar an archive, it will
generally succeed for healthy files, even if some
of the files can't be restored.

In theory, the ZODB is very nice. The ability to
"back up" is very useful. And *most* of the time, 
indeed, almost always, it works that way. But every
once in awhile it does get screwed up.  And that's
where the problem lies -- it's brittle. Once it
breaks you have to resort to drastic measures to
get it fixed.  You lose the robustness of regular
operating systems -- no doubt because they've been
tested in so many more situations.

Now I realize I use Zope at my own risk, and I
don't blame anyone else for my troubles -- but I
hope you can see that this is a design problem.

Either one needs to reduce the reliance on the
Zope object database, or the ZODB needs to become
much more robust (maybe this means better analysis
tools?). Of course, I don't see that I'm going
to be building the latter, so unless Zope Corp
or some third party takes an interest in it for
their own reasons, I guess I'll be having to
go for the former -- move more content out of
Zope, perhaps using LocalFS or some such. I mention
this particularly, because it refutes the claims
made by some of the more enthusiastic developers
on the list, who've championed putting everything
into the ZODB.

Well, anyway, this convinced me that I'm putting
a little too much faith in Zope's storage working
for me (at least the default one).

Terry

-- 
------------------------------------------------------
Terry Hancock
hancock@anansispaceworks.com       
Anansi Spaceworks                 
http://www.anansispaceworks.com 
P.O. Box 60583                     
Pasadena, CA 91116-6583
------------------------------------------------------