[ZODB-Dev] LargeFS

Tamas Hegedus hegedus at med.unc.edu
Sat Feb 11 09:56:57 EST 2006


Hi,

I beg for your patience :-)
I run into a problem that I can not solve. It could be something very 
trivial, but ZODB seems to be so simple that I can not figure out what 
could be the problem.

I have to text files:
1. 800M, approx 200,000 records
2. 6G, approx 2,000,000 records

I parse them into an object (SProt.Record(Persistent)) and store in 
ZODB. In order to do this I have a script that I run twice (see the code 
at the bottom).

RUN#1: everything is OK, populated in approx 20 minutes, Data.fs size is 
approx 800M; interestingly the *fs.tmp remains 800M after closing the 
connection.

RUN#2:
- I comment out the OOBTree() line; and change the file name to parse;
- Data.fs.tmp does becomes 0 (zero)
- parsing is running; processor extensievly used; approx 500M RAM is 
used by the script; no significant increase in swap usage;
- Data.fs, Data.fs.index size do not change; Data.fs.tmp stays at zero;
- data 'population' stops after the approx 1,300,000th object with the 
errormessage below ('No space left on device'); although I have not run 
out of space any of my hd partition...

???

I think the 2 major observations:
1. I have to figure out while I can not write into the ZODB in the 
second run.
2. I tried to use commit() instead of savepoint() after every 50,000 
objects; I tried to do some queries: the db populated with a lot of 
commits was significantly slower w/o any serious measurments... Let say 
not 0.5s but 2s.

Thanks for your time,
Tamas

-----------------------------------------------------------------------
ERROR MSG:

Traceback (most recent call last):
   File "/home/hegedus/mypy/obiodb/populateUP.py", line 40, in ?
     transaction.savepoint(True)
   File "build/lib.linux-i686-2.4/transaction/_manager.py", line 110, in 
savepoint
   File "build/lib.linux-i686-2.4/transaction/_transaction.py", line 
297, in savepoint
   File "build/lib.linux-i686-2.4/transaction/_transaction.py", line 
294, in savepoint
   File "build/lib.linux-i686-2.4/transaction/_transaction.py", line 
674, in __init__
   File "build/lib.linux-i686-2.4/ZODB/Connection.py", line 1060, in 
savepoint
   File "build/lib.linux-i686-2.4/ZODB/Connection.py", line 526, in _commit
   File "build/lib.linux-i686-2.4/ZODB/Connection.py", line 554, in 
_store_objects
   File "build/lib.linux-i686-2.4/ZODB/Connection.py", line 1188, in store
IOError: [Errno 28] No space left on device

------------------------------------------------------------------------
MY LOGs
for file #1
Fri Feb 10 22:09:12 2006
50000 Fri Feb 10 22:12:36 2006
100000 Fri Feb 10 22:16:08 2006
150000 Fri Feb 10 22:19:36 2006
200000 Fri Feb 10 22:22:56 2006
Fri Feb 10 22:28:16 2006

for file#2:
Fri Feb 10 22:40:24 2006
50000   Fri Feb 10 22:42:18 2006
100000  Fri Feb 10 22:44:21 2006
150000  Fri Feb 10 22:46:23 2006
200000  Fri Feb 10 22:48:30 2006
250000  Fri Feb 10 22:50:50 2006
300000  Fri Feb 10 22:52:59 2006
350000  Fri Feb 10 22:55:18 2006
400000  Fri Feb 10 22:57:15 2006
450000  Fri Feb 10 22:59:20 2006
500000  Fri Feb 10 23:01:32 2006
550000  Fri Feb 10 23:03:54 2006
600000  Fri Feb 10 23:06:40 2006
650000  Fri Feb 10 23:08:46 2006
700000  Fri Feb 10 23:10:51 2006
750000  Fri Feb 10 23:12:55 2006
800000  Fri Feb 10 23:15:00 2006
850000  Fri Feb 10 23:17:16 2006
900000  Fri Feb 10 23:19:12 2006
950000  Fri Feb 10 23:21:04 2006
1000000 Fri Feb 10 23:23:06 2006
1050000 Fri Feb 10 23:24:58 2006
1100000 Fri Feb 10 23:27:09 2006
1150000 Fri Feb 10 23:29:11 2006
1200000 Fri Feb 10 23:31:36 2006
1250000 Fri Feb 10 23:34:00 2006
1300000 Fri Feb 10 23:36:12 2006
1350000 Fri Feb 10 23:38:28 2006
1400000 Fri Feb 10 23:40:39 2006
1450000 Fri Feb 10 23:42:49 2006
1500000 Fri Feb 10 23:44:57 2006
------------------------------------------------------------------------

SCRIPT:
#--- imports - skipped ----
db               = ZODB.config.databaseFromURL("etc/zodb.conf")
connection       = db.open()
droot            = connection.root()
#droot['uniprot'] = OOBTree()
upDb             = droot['uniprot']

#--------------------------------------------------------------
it = SProt.Iterator( open( '/home/src/uniprot/7.0/uniprot_trembl.dat'),
#it = SProt.Iterator( open( '/home/src/uniprot/7.0/uniprot_sprot.dat'),
                      SProt.RecordParser())

ofile = open( '/home/hegedus/mypy/obiodb/docs/myLogFileN.txt', 'w')
i     = 0

ofile.write( time.asctime() + '\n')

for rec in it:
     acc       = copy.deepcopy( rec.accessions[0])
     upDb[acc] = rec
     i += 1
     if i % 50000 == 0:
         transaction.savepoint(True)
         ofile.write( "%s\t%s\n" % (i , time.asctime()))
         ofile.flush()

transaction.commit()
connection.close()
print time.asctime()

-- 
Tamas Hegedus, PhD          | phone: (1) 919-966 0329
UNC - Biochem & Biophys     | fax:   (1) 919-966 5178
5007A Thurston-Bowles Bldg  | mailto:hegedus at med.unc.edu
Chapel Hill, NC, 27599-7248 | http://biohegedus.org


More information about the ZODB-Dev mailing list