- Hints and strategical info for database migration needed (longish ) )
After some days of evaluation I decided that Zope is (surprise!-) exactly what I need for my current application. I'm trying to publish some thousand elements of strictly hierarchical data with an advanced search interface (for the user) and a management interface for which Zope's standard interface would be just fine. Currently I generate my data strucure - an attributed tree with cyclic links back from each element to its father - with a python program from a flat file. I'm a bit uncertain about the best way to migrate this structure into Zope and would appreciate some opinions about it. Since I would like to recycle Zope's management interface, I it seems better to have the data element-wise included into the persistent storage with a separate product class for each node or leaf type in my structure (the tree itself is very heterogenous) than to write an interface to my existing data management (not speaking of the other benefits BoboPOS provides). The questions I'm concerned about are - The performance of the query functions: At the moment, the whole data fits well into memory (few thousand elements / ~10 MB). How big is the overhead in speed and size if I turn each element into a persistent product instance. - Are Products the right way to go? - Are there any problems with the inherent cyclic structure of the data in conjunction with the persistent storage? - Let's say I would subclass Zope's folder class for my inner nodes. I think the links to the containing objects are provided anyway due to the aquisition structure? - And most important: How do migrate my data into the Z database? It would be cool for me using a HTTP file upload to completely replace the interned database with the data from the uploaded file (I could retain my existing flat file parser). But how do I access the persistent storage from (let's say) the external method performing the upload). Of course, I have seen the BoboPOS docs, but what are the concrete instances I can access from an external method? Though being generally a little bit overwhelmed by the whole Z documentation and confused by the Bobo/Principia legacy naming, the latter is one of the most unclear parts for a newbie like me: What is the surrounding API in a published module. What functions/modules/globals can I access? Any help appreciated - Stefan
Stefan Franke wrote:
After some days of evaluation I decided that Zope is (surprise!-) exactly what I need for my current application. I'm trying to publish some thousand elements of strictly hierarchical data with an advanced search interface (for the user) and a management interface for which Zope's standard interface would be just fine.
Currently I generate my data strucure - an attributed tree with cyclic links back from each element to its father - with a python program from a flat file. I'm a bit uncertain about the best way to migrate this structure into Zope and would appreciate some opinions about it.
Since I would like to recycle Zope's management interface, I it seems better to have the data element-wise included into the persistent storage with a separate product class for each node or leaf type in my structure (the tree itself is very heterogenous) than to write an interface to my existing data management (not speaking of the other benefits BoboPOS provides).
The questions I'm concerned about are - The performance of the query functions: At the moment, the whole data fits well into memory (few thousand elements / ~10 MB). How big is the overhead in speed and size if I turn each element into a persistent product instance.
I don't think that persistence will add significantly to the memory usage. Memory usage depends on the "state" of the object. Persistent objects can be in one of three states wrt persistence: - Not in memory The storage requited for out of memory depends on the storage manager used. The current storage manager consumes about 6 bytes per persistent object whether or not the object is in memory. In the next generation of the database, there will be storage managers that do not impose per-object memory costs. - In memory and active In addition to the storage used by the object, there is about 26 bytes of overhead on 32-bit machines. On 64-bit machines, the overhead is probably a little less than twice this. - In memory not not active. Inactive objects have the same persistence overhead as active objects, but they usually consume much less memory because their state (e.g. instance dictionary items) are not in memory. An important point to keep in mind is that most of the time, only a small percentage of your database is in memory. Depending on your access patterns, the memory consumed by the persistent objects should be much less than that required to load the entire non-persistent network in memory.
- Are Products the right way to go?
Yes.
- Are there any problems with the inherent cyclic structure of the data in conjunction with the persistent storage?
Actually, the persistence machinery buys you alot here. The cache manager automagically breaks circular references when it deactivates objects.
- Let's say I would subclass Zope's folder class for my inner nodes. I think the links to the containing objects are provided anyway due to the aquisition structure?
Right. You don't need to subclass folder to get this. This is a feature of acquisition.
- And most important: How do migrate my data into the Z database? It would be cool for me using a HTTP file upload to completely replace the interned database with the data from the uploaded file (I could retain my existing flat file parser).
This should be straightforward.
But how do I access the persistent storage from (let's say) the external method performing the upload).
A major goal of the persistence mechanism used by Zope is transparency. You access persistent storage by simply modifying objects.
Of course, I have seen the BoboPOS docs, but what are the concrete instances I can access from an external method?
If your method has a 'self' argument, it will be passed the folder in which the method was invoked. Your upload method will look something like this: def myupload( self, # The folder that will contain the tree id, # The id of the tree in the folder (ie attr name) title, # and optional descriptive title data_file # The raw data ): # Compute the tree object. This is pretty much the # same thing you have now. except that the various instances # in the tree now mix in BoboPOS.Persisistent and follow # the few basic rules for dealing with subobjects. tree=myOldParserFunction(data_file) tree.id=id tree.title=title # You could just: # setattr(self, id, tree) # to poke the tree in the folder and make it persistent, # # but you probably want to manage the tree through the # Zope interface, so instead you'll: self._setObject(id, tree) Note that none of the code above has anything to do with persistence. :)
Though being generally a little bit overwhelmed by the whole Z documentation and confused by the Bobo/Principia legacy naming, the latter is one of the most unclear parts for a newbie like me: What is the surrounding API in a published module. What functions/modules/globals can I access?
We're working on improving the documentation. Alot of it is there. There's alot of good development documentation at: http://www.zope.org/Documentation/Reference Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.
At 07:50 PM 1/8/99 +0100, Stefan Franke wrote:
After some days of evaluation I decided that Zope is (surprise!-) exactly what I need for my current application. I'm trying to publish some thousand elements of strictly hierarchical data with an advanced search interface (for the user) and a management interface for which Zope's standard interface would be just fine.
All right!
Currently I generate my data strucure - an attributed tree with cyclic links back from each element to its father - with a python program from a flat file. I'm a bit uncertain about the best way to migrate this structure into Zope and would appreciate some opinions about it.
There are a lot of ways you could choose to go with this project.
Since I would like to recycle Zope's management interface, I it seems better to have the data element-wise included into the persistent storage with a separate product class for each node or leaf type in my structure (the tree itself is very heterogenous) than to write an interface to my existing data management (not speaking of the other benefits BoboPOS provides).
So you plan on managing the elements of your data structure through the web with a Zope management interface?
The questions I'm concerned about are - The performance of the query functions: At the moment, the whole data fits well into memory (few thousand elements / ~10 MB). How big is the overhead in speed and size if I turn each element into a persistent product instance.
I'm not sure. It probably depends on your current representation. I would guess that things won't be too much bigger as Products rather than whatever Python classes you are using now. If you are doing a lot of searching, you might consider creating indexes or using other methods of improving searching. But it's hard for me to give too much advice without knowing many specifics.
- Are Products the right way to go?
It's hard to say without knowing more about your project. In general a Product is a good thing to use when your project involves discrete objects that you would like to manage through the web. But just because you use a Product, doesn't mean all your data needs to consist of Products as well. For example, Confera is a Product, but each message in the Confera is simply a Python instance, not a Product. So depending on the specifics of your project, it might be appropriate to write a Product to handle the entire data structure.
- Are there any problems with the inherent cyclic structure of the data in conjunction with the persistent storage?
No more so than in normal Python. But if you're using Zope, you really should check out acquisition which gives a great solution to the problem of linking children to their parents without creating circular references.
- Let's say I would subclass Zope's folder class for my inner nodes. I think the links to the containing objects are provided anyway due to the aquisition structure?
Yes. You can get to your parent via the 'aq_parent' attribute.
- And most important: How do migrate my data into the Z database? It would be cool for me using a HTTP file upload to completely replace the interned database with the data from the uploaded file (I could retain my existing flat file parser).
That's definitely doable. For example the fsimport.py External Method shows how to build Zope objects while mucking around in the filesystem.
But how do I access the persistent storage from (let's say) the external method performing the upload). Of course, I have seen the BoboPOS docs, but what are the concrete instances I can access from an external method?
Well if you're going to write a Product, you might just want to include a fileupload function in your Product... But in either case your entry into the object hierarchy is always 'self'. You need to work from 'self' to where ever you want to go. In the case of an External Method, 'self' is bound to the Folder in which the method was called. In the case of a Product method, 'self' is bound the the Product instance. There is nothing weird about this--this is normal Python as far as I can tell.
Though being generally a little bit overwhelmed by the whole Z documentation and confused by the Bobo/Principia legacy naming, the latter is one of the most unclear parts for a newbie like me: What is the surrounding API in a published module. What functions/modules/globals can I access?
Well it can be rather daunting. I suggest you consult the Object Reference for a list of methods you can call on Zope objects. True, it doesn't contain everything you can do, but it's a good start. As far as what modules and packages are available to you--they are all in 'lib/python'. As far as what methods are available to your object--they are the ones you define in your Product class along with ones that you acquire from your parent objects (assuming your Product inherits from Acquisition.Implicit). I agree that we need more documentation along these lines. For example once you find out about MessageDialog you might wonder what other cool things are out there. Also, I would suggest as you start out to not try to understand all the services Zope provides. Start with getting a feel for the basic tropes, like properties, acquisition, some of the basics of the Product API as documented in the Product Tutorial. Build some smaller projects and try to get comfortable before you try to tackle a major project. Finally I highly recommend using the source, that's how I've learned most of what I know about Zope. Many existing Zope components are structured as Products, so you can learn quite a bit by studying them. Good luck! -Amos
participants (3)
-
Amos Latteier -
Jim Fulton -
Stefan Franke