[Zope-dev] Data.fs corruption when creating lots of objects.

R. David Murray bitz@bitdance.com
Thu, 27 Apr 2000 16:09:20 -0400 (EDT)


OK, so I'm trying to reload this 60K record database that I screwed
up the load on before (but got it working anyway).  I think I'm
doing it right this time <wry grin>.  But the load craps out with
a database corruption error.  (I'm doing the reload, by the
way, because trying to compress the original database results in
a broken database (ie: I could see a certain folder in the management
interface but trying to access it resulted in a resource not found
error message for its URL).

This time I'm doing the work under FreeBSD 3.4-stable and Zope
2.1.4.  (2.1.4 rather than 2.1.6 because of the reindex_object bug
in 2.1.6).  I don't see any checkins for ZODB in the CVS recent
enough to be an issue.

I have an external method that opens the source data file and
builds the ZClass objects to hold the data.  The method puts fifty
items in each folder, generating folders and subfolders as necessary
to achieve that.  Every 1000 object additions I do a
get_transaction().commit().  (I tried commit(1), but that didn't
seem to write anything to the disk and memory just kept growing,
so I switched to a full transaction commit which seems to work).

At a Data.fs size of around 200MB, I get the following error returned
by Client.py (which I use to call the external method):

bci.ServerError: bobo exception (File: /usr/local/zope/directory/lib/python/ZODB/FileStorage.py Line: 617)

Line 617 says:

if doid != oid: raise CorruptedDataError, h

So what it is reading back from the DB is not agreeing with what
it expected.  I also tried loading the objects in groups of
2500, compressing the DB after each sectional load.  At around
75MB the compression failed with an error message that I'm afraid
I failed to write down, but it had to do with a mismatch between
the size of a transaction and the size of a contained or container
data structure.  In terms of the number of objects created, that
was sooner than in the full load case.

I've tried the big load twice, and it failed both times at around 200MB,
but at different specific file sizes.  I'm not sure if that means
it was in a different transaction group or not, but I'm guessing so.

Debugging suggestions are gratefully accepted.  I've started perusing
FileStorage.py, but it is somewhat heavy mojo and I haven't figured
out where to start debugging yet.  I also wish the thing failed
sooner in the load; it's a pain to have to wait two hours for each
test run to fail...I think I'll redo the sectional load and try
to capture a maybe-valid database such that the next load causes
the corruption.

The source data is only 8MB, and the Data.fs before the load is
about 5MB, if anyone feels like trying to reproduce it (you just
have to promise to throw the data away afterwards <grin>).  I'm also
happy to give anyone who wants to work on this a temporary login
on the dedicated box on which I'm trying to do this load.

Hmm, I guess I might as well attach the external method here in
case I'm doing something stupid.  Or if someone can suggest a
better way to do the load that will allow me to sidestep whatever
bug I'm tickling I'd also be grateful, though I'd like to 
shoot this bug in any case.

If I'd known I was going to run into this much trouble I'd have
used Postgres to store the data (I may have to do that yet!), but
there are good (future) reasons for having the data in the ZODB...

--RDM

def importDL(self,REQUEST):
  """ """
  from string import split
  inf = open('/usr/local/zope/directory/Extensions/DLData')
  data = inf.readline()
  inner = 0
  mid = 0
  outer = 0
  mid_s = "m%s" % mid
  outer_s = "%s" % outer
  while data:
    if inner%1000==0: get_transaction().commit()
    (SIC_code, SIC_desc, business_name, city, fax_phone,
      in_regions, keywords, primary_phone, state, street_address1,
      street_address2, zip, null) = split(data[:-1],'\t')
    if inner%50==0:
      if mid%50==0:
        outer = outer + 1
        outer_s = "%s" % outer
        self.manage_addFolder(id=outer_s,title=outer_s)
      mid = mid+1
      mid_s = "m%s" % mid
      self[outer_s].manage_addFolder(id=mid_s,title=mid_s)
    inner = inner+1
    inner_s = "i%s" % inner
    newDL = self[outer_s][mid_s].manage_addProduct['ECardProduct'].\
      DirectoryListingClass.createInObjectManager(inner_s,REQUEST)
    newDL.propertysheets.info.manage_editProperties({
      'SIC_code': SIC_code,
      'business_name': business_name,
      'city': city,
      'fax_phone': fax_phone,
      'in_regions': split(in_regions),
      'primary_phone': primary_phone,
      'state': state,
      'street_address': street_address1,
      'zip': zip,
      'useSIC_code': 1
      })
    newDL.reindex_object()
    data = inf.readline()
  return "OK"