Hi. I had asked this question on zope3 about a week ago or so an had no responses. I am hoping I can receive some general guidance on this issue. I am trying to determine the best structure for storing a large schema where some attributes are lists or dictionaries. There are about 70 attributes in the schema and I am trying to choose a structure that will not necessarily have to hold a bunch of empty space. I have a base schema of approximately 30 attributes and others that subclass from it with the largest being about 70. I had originally thought of RDF at the onset and built a datastore with rdflib using a relational database. I chose this option because when ZODB gets larger it takes plenty of RAM. Problem here is the number of accesses to gather a complete object. It is pretty efficient from a storage perspective since if an object does not have particular attributes, you are not storing them and all items in the store are unique. I was not duplicating one piece of data. But say you wanted to present a page consisting of 20 items or do a search. Gathering this up takes much time when you are hitting a disk how many times to gather up just a single object so query times were unacceptable and loading rdf from outside sources also took a very long time. The data store grows into millions of records so you better have a pretty sweet rdb server with lots of RAM also. I had dismissed a relational database on its own since the data does not lend to a row and I may want to add to the schema at some point in time which could mean some pretty ugly business this way. But then I saw the vertical example in the examples folder of SQLAlchemy that can do something to create dynamic fields as necessary to potentially avoid this kind of hassle. The ZODB provides the flexibility and Generations could work well for future updates so this looks very good but how efficient is it if 15% of the attributes have data and 85% do not? I have also been experimenting with hybrid pickle / rdb storage so that the attributes that will receive the most attention are stored as fields and the full record is stored as pickle that is unpickled for views and data entry. In any case. Thought I'd ask since I am concerned about the efficiency of storage and speed of access both. If rdf access was fast then it would be great but this had not been the case. I just thought there may be some other ideas on this or someone could advise on the efficiency of ZODB when in some cases, uses will be selective about which attributes are important to them. Regards, David
--On 17. April 2006 14:08:38 -0300 David Pratt <fairwinds@eastlink.ca> wrote:
Hi. I had asked this question on zope3 about a week ago or so an had no responses. I am hoping I can receive some general guidance on this issue. I am trying to determine the best structure for storing a large schema where some attributes are lists or dictionaries. There are about 70 attributes in the schema and I am trying to choose a structure that will not necessarily have to hold a bunch of empty space. I have a base schema of approximately 30 attributes and others that subclass from it with the largest being about 70.
How about using Zope BTree datastructures? They are heavily optimized to be used within the ZODB. -aj -- ZOPYX Ltd. & Co. KG - Charlottenstr. 37/1 - 72070 Tübingen - Germany Web: www.zopyx.com - Email: info@zopyx.com - Phone +49 - 7071 - 793376 E-Publishing, Python, Zope & Plone development, Consulting
Hi Andreas. Yes, I have been thinking what type of object would be best. I was also thinking about putting whole zodb into rdb with something like PGStorage because my feeling is if I work with zodb strictly, it may bulk quickly. I guess this is another way of dealing with pickles which I have been contemplating but in a way that does not change the utility of objects in Zope. For my knowledge, how exactly are empty attributes dealt with in BTree so that if say 40 of 70 attributes are empty that it is handling this efficiently? Regards, David Andreas Jung wrote:
--On 17. April 2006 14:08:38 -0300 David Pratt <fairwinds@eastlink.ca> wrote:
Hi. I had asked this question on zope3 about a week ago or so an had no responses. I am hoping I can receive some general guidance on this issue. I am trying to determine the best structure for storing a large schema where some attributes are lists or dictionaries. There are about 70 attributes in the schema and I am trying to choose a structure that will not necessarily have to hold a bunch of empty space. I have a base schema of approximately 30 attributes and others that subclass from it with the largest being about 70.
How about using Zope BTree datastructures? They are heavily optimized to be used within the ZODB.
-aj
--On 17. April 2006 14:42:24 -0300 David Pratt <fairwinds@eastlink.ca> wrote:
Hi Andreas. Yes, I have been thinking what type of object would be best. I was also thinking about putting whole zodb into rdb with something like PGStorage because my feeling is if I work with zodb strictly, it may bulk quickly. I guess this is another way of dealing with pickles which I have been contemplating but in a way that does not change the utility of objects in Zope. For my knowledge, how exactly are empty attributes dealt with in BTree so that if say 40 of 70 attributes are empty that it is handling this efficiently?
I don't know much about BTRee internals but all indexes use large and sometimes complex BTree structures. Also nested BTrees aren't are problem. I would say: 40-70 is nothing I would get bad sleep from. -aj -- ZOPYX Ltd. & Co. KG - Charlottenstr. 37/1 - 72070 Tübingen - Germany Web: www.zopyx.com - Email: info@zopyx.com - Phone +49 - 7071 - 793376 E-Publishing, Python, Zope & Plone development, Consulting
participants (2)
-
Andreas Jung -
David Pratt