Why can't I add lots of objects at once?
A month or two ago I asked here for advice on putting roster data (~233K entries/semester) in our database. One of the suggestions was to use XML-RPC in order to avoid the timeouts I was getting. I've been working on an XML-RPC solution and it is quite adequate for small data sets, but if I try to add a few hundred student/class entires, it hangs. After a few iterations, I've come up with this script on the Zope side. Essentially, it creates a structure like AAE 203 01 01 students kyler.b.laird.1 sally.r.smith.5 =================================================== if (not csubject in courses.objectIds()): courses.manage_addFolder(csubject, csubject) if (not cnumber in courses[csubject].objectIds()): courses[csubject].manage_addFolder(cnumber, cnumber) if (not division in courses[csubject][cnumber].objectIds()): courses[csubject][cnumber].manage_addFolder(division, division) if (not section in courses[csubject][cnumber][division].objectIds()): courses[csubject][cnumber][division].manage_addFolder(section, section) if (not 'students' in courses[csubject][cnumber][division][section].objectIds()): courses[csubject][cnumber][division][section].manage_addFolder('students', 'students') current_students = courses[csubject][cnumber][division][section]['students'].objectIds() for student in students: if (not student in current_students): courses[csubject][cnumber][division][section]['students'].manage_addFolder(student, student) =================================================== Terribly naive programming aside, what's wrong with calling this a couple hundred times? It seems to just keep taking more and more of the processor time. Eventually I give up, kill the server and restart it. Then I can run the script again and it will get a little bit further (because it doesn't have to create the existing objects, I assume). Of course, I'd prefer not to have to break this up into one thousand chunks with server restarts in between each one. Any suggestions? Thank you. --kyler
On Wednesday 11 April 2001 10:42, Kyler B. Laird wrote:
I've been working on an XML-RPC solution and it is quite adequate for small data sets, but if I try to add a few hundred student/class entires, it hangs.
After a few iterations, I've come up with this script on the Zope side. Essentially, it creates a structure like AAE 203 01 01 students kyler.b.laird.1 sally.r.smith.5
Terribly naive programming aside, what's wrong with calling this a couple hundred times? It seems to just keep taking more and more of the processor time. Eventually I give up, kill the server and restart it. Then I can run the script again and it will get a little bit further (because it doesn't have to create the existing objects, I assume).
hi i'm a parrot from zope-dev: in addition to chrisw's comments. if any of these classes are by change catalog aware, thats probably your answer, since your doing incremental updates to the catalog for each add call. if they are catalog aware, you're better off using a non-catalog aware class, and doing a batch catalog on them when you're done adding. die parrot, die. another option is try and store them in an intermediate format on the server, and then load them up all at once via an external method. hth kapil
"Kyler B. Laird" wrote:
A month or two ago I asked here for advice on putting roster data (~233K entries/semester) in our database. One of the suggestions was to use XML-RPC in order to avoid the timeouts I was getting.
I've been working on an XML-RPC solution and it is quite adequate for small data sets, but if I try to add a few hundred student/class entires, it hangs.
<snip>
restart it. Then I can run the script again and it will get a little bit further (because it doesn't have to create the existing objects, I assume).
Of course, I'd prefer not to have to break this up into one thousand chunks with server restarts in between each one.
<stab type="my, it's dark in here ;-)"> This might be write conflicts in the ZODB. Is your XML-RPC process threaded? If so, make it not. If not, then it really is too dark in here for me to stab </stab> In any case, I'd recommend using BTree folders instead of nromal ones, especially with that number of objects. See if you can persuade Shane to make a BTree folder that uses the new 2.3.1 BTrees, which are also supposed to make this osrt of thing better. Apart from that, I'm outta ideas :-S cheers, Chris
I finally figured out how to get all of my course roster objects (one folder per student per class section - ~233K objects). Instead of trying to use a single XML-RPC session for the entire process *or* using a seperate session for each class section, I used a new session for each course subject. This split the data into about 100 chunks. It updated all night without locking up. I haven't determined exactly what the limits of XML-RPC are, but this seems to get around it. At least it's better than grinding Zope into the ground and restarting it. Thank you for the help. --kyler
participants (3)
-
Chris Withers -
ender -
Kyler B. Laird