best way to populate ZODB with ~11,000 folders?
I'm trying to shove the University into Zope. I've got (~60K?) users handled and now I want to attack classes. We have a text file with roster data and an Oracle database of course info. My plan is to make a folder for every section of every course, and make objects or methods for every piece of data we have about each course (like "description" and "prerequisites") in these folders. This means I'll have paths like /courses/2001.1/AT/249/03/02 and that would have /courses/2001.1/AT/249/03/02/instructor even though the instructor might be acquired from a parent folder. So...my first step (late last night/early this morning) was to try to make folders for every section. We have 10,699 sections this semester, and I ran into some limits (like FastCGI timeout, then Netscape timeout, ...) in populating the ZODB. Before I just start kludging my way around this, I want to make sure there's not a better way. I'll be doing this every semester, so I want to do it well. Today I learned about XML-RPC. We're hard at work trying to get the HTTPS capability welded on to it. (We only use HTTPS for Zope.) It seems that it might be better to use this to populate the ZODB. Although I'd rather keep everything in Zope, I'm thinking I'd write a Python program that parses the text file and communicates with Oracle to get all of the data. As it's iterating over that data, it would shoot off XML-RPC transactions with Zope. There would be something on the order of 17,000 transactions. If I decide to handle rosters through the ZODB, there would be another 230,000 transactions. Is this a reasonable way to handle this? Am I going to kill the ZODB? Will I swamp it with transaction logs? I recall seeing that the ZODB isn't so great for writing. I'll be doing a lot of writing. We've used Oracle for everything, but I want so much to move completely to ZODB so that I can use ZEO. I'm willing to put in some extra effort to move down this path. Thank you. --kyler
This means I'll have paths like /courses/2001.1/AT/249/03/02 and that would have /courses/2001.1/AT/249/03/02/instructor even though the instructor might be acquired from a parent folder.
Been there done it as well... ah this brings back memories.
Today I learned about XML-RPC. We're hard at work trying to get the HTTPS capability welded on to it. (We only use HTTPS for Zope.) It seems that it might be better to use this to populate the ZODB.
Yes it is. Either this or do it as an External Method, I wrote one that populated 100 folders, committed the transaction. And so on. Really when are doing this sort of this make sure using ZODB is the correct thing. In the end I decided it wasn't.
Is this a reasonable way to handle this? Am I going to kill the ZODB? Will I swamp it with transaction logs? I recall seeing that the ZODB isn't so great for writing. I'll be doing a lot of writing.
If you do it all in one go, yes. Do a it a bit at time (by script) and commit it. At the extreme you may want to include restarting Zope as well each time. Zope is optimised for reading, but writing a lot once is fine. Make a seperate transaction log for yourself to monitor your script. Its continual writing that is a problem.
We've used Oracle for everything, but I want so much to move completely to ZODB so that I can use ZEO. I'm willing to put in some extra effort to move down this path.
You can't use a relational DB with ZEO? Thats a problem. -- Andy McKay.
[Good Stuff snipped] Thanks for the encouragement and suggestions.
We've used Oracle for everything, but I want so much to move completely to ZODB so that I can use ZEO. I'm willing to put in some extra effort to move down this path.
You can't use a relational DB with ZEO? Thats a problem.
Yes, a relational DB should be usable with ZEO (right?) although I haven't tried that yet. I was just pointing out that I want to go *through* ZEO/ZODB to get to the data instead of writing ZSQL Methods to access it. Thank you. --kyler
You're welcome. I went through the same thing and decided that I was doing it the wrong way, hopefully your experience will be better. Storing it a ZODB has definite powerful advantages and dis-advantages. In my mind I believe the way forwards is ZPatterns which can produce a ZODB style storage, whilst really storing everything in a RDBMS. I couldn't tell you more than that though since I don't understand it myself. -- Andy McKay. ----- Original Message ----- From: "Kyler B. Laird" <laird@ecn.purdue.edu> To: "Andy McKay" <andym@activestate.com> Cc: <zope@zope.org>; <cameron@lairds.com> Sent: Thursday, February 15, 2001 1:04 PM Subject: Re: [Zope] best way to populate ZODB with ~11,000 folders?
[Good Stuff snipped]
Thanks for the encouragement and suggestions.
We've used Oracle for everything, but I want so much to move completely to ZODB so that I can use ZEO. I'm willing to put in some extra effort to move down this path.
You can't use a relational DB with ZEO? Thats a problem.
Yes, a relational DB should be usable with ZEO (right?) although I haven't tried that yet.
I was just pointing out that I want to go *through* ZEO/ZODB to get to the data instead of writing ZSQL Methods to access it.
Thank you.
--kyler
_______________________________________________ Zope maillist - Zope@zope.org http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
On Thu, Feb 15, 2001 at 03:26:01PM -0500, Kyler B. Laird wrote:
Today I learned about XML-RPC. We're hard at work trying to get the HTTPS capability welded on to it. (We only use HTTPS for Zope.) It seems that it might be better to use this to populate the ZODB.
XML-RPC over HTTPS is already done for you: $ python xmlrpc_cli.py send: 'POST / HTTP/1.0\015\012' send: 'Host: nova\015\012' send: 'User-Agent: xmlrpc_ssl.py/0.05p2 - xmlrpclib.py/0.9.8 (by www.pythonware.com)\015\012' send: 'Content-Type: text/xml\015\012' send: 'Content-Length: 106\015\012' send: '\015\012' send: "<?xml version='1.0'?>\012<methodCall>\012<methodName>propertyMap</methodName>\012<params>\012</params>\012\012</methodCall>\012" reply: 'HTTP/1.0 200 OK\015\012' header: Server: Zope/Zope 2.3.0 (source release, python 1.5.2, linux2) ZServerSSL/0.06 header: Date: Fri, 16 Feb 2001 15:19:04 GMT header: Connection: close header: Content-Type: text/xml header: Content-Length: 322 [{'id': 'title', 'type': 'string'}] Here's the (abridged) code: from M2Crypto import Rand from M2Crypto.xmlrpclib2 import Server, SSL_Transport Rand.load_file('../randpool.dat', -1) # Server is Zope-2.3.0 on ZServerSSL. zs=Server('https://nova:8443/', SSL_Transport()) print zs.propertyMap() Rand.save_file('../randpool.dat') Here's the plumbing: http://www.post1.com/home/ngps/m2 Have fun! -- Ng Pheng Siong <ngps@post1.com> * http://www.post1.com/home/ngps
participants (3)
-
Andy McKay -
Kyler B. Laird -
Ng Pheng Siong