OK, so I'm trying to reload this 60K record database that I screwed up the load on before (but got it working anyway). I think I'm doing it right this time <wry grin>. But the load craps out with a database corruption error. (I'm doing the reload, by the way, because trying to compress the original database results in a broken database (ie: I could see a certain folder in the management interface but trying to access it resulted in a resource not found error message for its URL). This time I'm doing the work under FreeBSD 3.4-stable and Zope 2.1.4. (2.1.4 rather than 2.1.6 because of the reindex_object bug in 2.1.6). I don't see any checkins for ZODB in the CVS recent enough to be an issue. I have an external method that opens the source data file and builds the ZClass objects to hold the data. The method puts fifty items in each folder, generating folders and subfolders as necessary to achieve that. Every 1000 object additions I do a get_transaction().commit(). (I tried commit(1), but that didn't seem to write anything to the disk and memory just kept growing, so I switched to a full transaction commit which seems to work). At a Data.fs size of around 200MB, I get the following error returned by Client.py (which I use to call the external method): bci.ServerError: bobo exception (File: /usr/local/zope/directory/lib/python/ZODB/FileStorage.py Line: 617) Line 617 says: if doid != oid: raise CorruptedDataError, h So what it is reading back from the DB is not agreeing with what it expected. I also tried loading the objects in groups of 2500, compressing the DB after each sectional load. At around 75MB the compression failed with an error message that I'm afraid I failed to write down, but it had to do with a mismatch between the size of a transaction and the size of a contained or container data structure. In terms of the number of objects created, that was sooner than in the full load case. I've tried the big load twice, and it failed both times at around 200MB, but at different specific file sizes. I'm not sure if that means it was in a different transaction group or not, but I'm guessing so. Debugging suggestions are gratefully accepted. I've started perusing FileStorage.py, but it is somewhat heavy mojo and I haven't figured out where to start debugging yet. I also wish the thing failed sooner in the load; it's a pain to have to wait two hours for each test run to fail...I think I'll redo the sectional load and try to capture a maybe-valid database such that the next load causes the corruption. The source data is only 8MB, and the Data.fs before the load is about 5MB, if anyone feels like trying to reproduce it (you just have to promise to throw the data away afterwards <grin>). I'm also happy to give anyone who wants to work on this a temporary login on the dedicated box on which I'm trying to do this load. Hmm, I guess I might as well attach the external method here in case I'm doing something stupid. Or if someone can suggest a better way to do the load that will allow me to sidestep whatever bug I'm tickling I'd also be grateful, though I'd like to shoot this bug in any case. If I'd known I was going to run into this much trouble I'd have used Postgres to store the data (I may have to do that yet!), but there are good (future) reasons for having the data in the ZODB... --RDM def importDL(self,REQUEST): """ """ from string import split inf = open('/usr/local/zope/directory/Extensions/DLData') data = inf.readline() inner = 0 mid = 0 outer = 0 mid_s = "m%s" % mid outer_s = "%s" % outer while data: if inner%1000==0: get_transaction().commit() (SIC_code, SIC_desc, business_name, city, fax_phone, in_regions, keywords, primary_phone, state, street_address1, street_address2, zip, null) = split(data[:-1],'\t') if inner%50==0: if mid%50==0: outer = outer + 1 outer_s = "%s" % outer self.manage_addFolder(id=outer_s,title=outer_s) mid = mid+1 mid_s = "m%s" % mid self[outer_s].manage_addFolder(id=mid_s,title=mid_s) inner = inner+1 inner_s = "i%s" % inner newDL = self[outer_s][mid_s].manage_addProduct['ECardProduct'].\ DirectoryListingClass.createInObjectManager(inner_s,REQUEST) newDL.propertysheets.info.manage_editProperties({ 'SIC_code': SIC_code, 'business_name': business_name, 'city': city, 'fax_phone': fax_phone, 'in_regions': split(in_regions), 'primary_phone': primary_phone, 'state': state, 'street_address': street_address1, 'zip': zip, 'useSIC_code': 1 }) newDL.reindex_object() data = inf.readline() return "OK"
The cause of the database corruption that I was observing has, with a fair degree of certainty, been identified, and I can sum it up in one word: hardware The server on which I was doing the load apparently has a really subtle bug where it will randomly and only occasionally write a single bad byte to disk. So a large database load had a high probability of getting hit. I began to suspect this when I discovered that various system binaries (the date command, lynx) were failing with odd error messages (bad syscall, for example). Yet the system mostly ran just fine... I redid the database load on new hardware, and things have been working perfectly. (Can you imagine what would have happened had this hardware been used for its usual market: Windows or NT? I could realize this was hardware because system binaries in FreeBSD just don't fail like that. But if it were Windows, one would doubtless just reinstall, and the problem would go away, only to crop up in some *other* binary later...) I'm very relieved to be able to report that I have *not* found a data corruption problem with the ZODB!! --RDM
participants (1)
-
R. David Murray