Efficient and flexible object references
[Originally posted on the main Zope list. Moving it here on recommendation for a deeper discussion.] Hello. I am building a Zope application (described more in full at the end of the message, eliminated here for brevity) that needs to make a large number of object references from other objects. * These references need to be as efficient as possible in terms of size and speed; * moreover, some caching of referenced objects' information, much like Catalog metadata, will be necessary. Hopefully this situation is generic enough that it will be of some interest. I had some ideas on how to do this, and some folks on the Zope list added significantly to my list of both ideas and concerns. The following is a listing of the approaches discussed so far, and their advantages and problems. The Catalog RID idea is dead in the water but tantalizingly close to what I need, so I have included it. Any contributions to further this list in any way would be greatly appreciated. I tried to put attributions in as clearly as possible. Thank you very much to all who have contributed so far. * tree-based Zope URL, or uid ADVANTAGES: The canonical high-level reference approach, apparently. Stable, workable. DISADVANTAGES: Inefficient in storage space No metadata-type information: either wake the object from the ZODB and ask it, or store what you want yourself (very difficult to keep fresh; it is rebuilding the ZCatalog metadata approach from another angle so feels like inventing the wheel) [speed issues in retrieving the object? must already be heavily optimized...] * actual standard object references [From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk> An alternative that has not been mentioned so far is storing a real object reference.] ADVANTAGES: efficient in storage space DISADVANTAGES: (non-standard Zope approach) No metadata; must wake the object to get the information From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk>
This may well be easier if you can live without managing your relationships as if they were folders, and without using Zopes security mechanisms to control accees to the referred-to objects. Therefore, referenced objects would either not be accessible and manageable via Zope (losing much of Zope's advantage as an object publisher), or they would be in a scary netherworld--in a Zope management tree but also directly referenced outside of it--in this case a hack that would probably cause significant problems. I think Toby meant the first scenario. My app would have to use the second scenario (the objects should be published), which is why I had not pursued it.
* Catalog (ZCatalog) rids (currently DEAD IN THE WATER but tantalizingly close to what I want) ADVANTAGES: Efficient use of space Can return metadata without waking actual object (a design goal of the ZCatalog, I believe) Metadata is updated whenever the catalog is updated (hopefully in approximate real time)--no new mechanism needed to keep metadata fresh (we're using a wheel that has already been invented) All of the needed methods except hasuid (i.e. getRID) are part of what I assume is the interface: getobject, getMetadataForRID, getIndexDataForRID, getpath DISADVANTAGES: From: "Dieter Maurer" <dieter@handshake.de>
"rid"s are not persistently associated with objects. If someone calls "manage_catalogReindex", then all your rids change. [that's the killer] Also, a hack, unless it were blessed by ZC at some point, because it relies on inner workings of the Catalog (i.e., the ZCatalog has no method to return an object's rid, and the catalog itself only has hasuid, which as named does not imply reliability as a way of getting RID)
* ZODB oid [From: "Casey Duncan" <cduncan@kaivo.com>
Another option to explore might be to store the oid] ADVANTAGES: efficient use of space DISADVANTAGES: From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk> OIDs are only unique within a single storage.
* If some objects are exported and reimported, their OIDs will change.
* You will get duplicate OIDs in the same Zope if you are using a mounted storage. From: "Dieter Maurer" <dieter@handshake.de> * they are a very low level feature, difficult to access from most Zope parts
* they may not be unique Think of "mountable storages". Then each storage will have its own OID's, interpreted in its own local context.
---------- As the list above stands now, I will either use a standard tree-based URL uid and try to build my own optimized caching mechanism (ouch) or go down the treacherous and very dangerous path (for the future) of attempting to make the ZCatalog do what I want no matter what. I suppose my first step there would be to see if I can find a way to allow reindexing of a catalog without changing RIDs... I can feel the frowns from here... Thanks for your ideas. For those of you still here, a few paragraphs describing my project more fully follow. Only read them if you feel they will help your brainstorming. :-) Thanks Gary ---------------------------------------- CASUAL PROJECT DESCRIPTION from an earlier post While I keep an eye to contributing back to the community by making my solutions as flexible as possible, I'm putting a super-bibliography for musicians, especially vocalists, into Zope. It stores objects describing compositions, books, texts, recordings, publications, people, topics, and other items. On a simple level, I need the kind of referencing I describe for connecting people objects as creators to other objects; for connecting any object to another (particularly topics) in a "describes" relationship; for connecting same-class objects in a parent-child relationship; and other similar tasks. (Obviously, I'm coming from a bit of a RDBM background on this but I'm enjoying the better modeling possible with the ZODB, among other things.) When a composition object displays, for instance, it needs to both know the name and address of all of it's creators, ideally without waking up the creators yet. Similarly, a person needs to know back links--what objects claim me as a creator? Rather than caching a page or an object, I have decided it will be best to cache the relationships and metadata somehow. ----------------- More details for the super-interested ------------------- The modelling for compositions is particularly complex, at least to me, since I include instruments needed, if any, and voices needed, if any; the voices themselves have high and low range extremes I am keeping track of, and even multiple options for those. If they are published, each song might be transposed by a given number of half steps (producing a new set of the high and low extremes for the composition). If the composition's parent is published and transposed, that means that will produce yet another set of high and low extremes. Displaying and searching by range extremes thus becomes quite complex, and a high, high candidate for caching. Even so, expecting my code to keep the cached information fresh when the relationships are so far-flung makes me nervous: I think I'll only be able to cache so far down the chain, and rely on live checks (or at least secondary cached metadata checks) for the rest. I'm figuring I'm going to need a new pluggable index, based on the work in PathIndex, for the complicated range searches and some other needs; an interlinking class that manages inter-object back and forward links behind the scenes for caching and getting the cached metadata I described; and some simple subclasses that will represent each of the data types. I have plans from there as well, but those are first steps.
On 17 Aug 2001 23:50:53 -0400, Gary & Karyn wrote:
[Originally posted on the main Zope list. Moving it here on recommendation for a deeper discussion.]
Hello.
I am building a Zope application (described more in full at the end of the message, eliminated here for brevity) that needs to make a large number of object references from other objects.
Wow. that really takes me on a ride in the Wayback Machine... I had similar thoughts once upon a time, but never pursued them: http://classic.zope.org/pipermail/zope/1999-November/013860.html I think that two of the approaches you listed can be hybridized to give you what you want. First, give your objects globally unique ids, according to some schema that you create (ie 'comp0000000000023'). Next, store the globally unique id as a reference to the object in the referring object. Then, catalog the referred object, making sure that the id is indexed (probably as a field index), and store any meta data that you want the referring objects to retrieve. The referring object can just query the ZCatalog using the guid to get the result object, along with any cached meta-data, without waking the referred object. Updating the catalog changes the rid, but that doesn't matter since you're retreiving the object from the catalog using a guid (which shouldn't change). Finally, catalog the referring object as well, indexing the field storing the referred objects guid. Thus the referred object can also query the catalog to get a result-set of the objects that refer to it, along with appropriate meta-data. Does this help? Michael Bernstein.
Michael and Robert--thank you very much. The global id looks like an excellent avenue to explore. With this or the ZPatterns approach, I am much relieved: what I need should fit within Zope well enough after all, thanks to your ideas. Michael, I also found the old thread you pointed to very interesting. It surprises me that this hasn't been dealt with officially yet, with concerns and discussion from that far back. The XLink looks like it is still being worked on, but the speed warnings for the overall XML support turned me off a bit, and makes it seem that this, at least in its current incarnation, is not the optimum solution to my exact kind of problem... Thank you again! Gary
On 19 Aug 2001 00:28:34 -0400, Gary & Karyn wrote:
Michael and Robert--thank you very much. The global id looks like an excellent avenue to explore. With this or the ZPatterns approach, I am much relieved: what I need should fit within Zope well enough after all, thanks to your ideas.
With ZPatterns, you're basically only getting separation of application logic from storage, with some extra neat things like skinscripts that make application logic easier. If you want the retreival of meta-data without waking up the object, you still need to layer on the global unique id with ZCatalog on top of ZPatterns.
Michael, I also found the old thread you pointed to very interesting. It surprises me that this hasn't been dealt with officially yet, with concerns and discussion from that far back. The XLink looks like it is still being worked on, but the speed warnings for the overall XML support turned me off a bit, and makes it seem that this, at least in its current incarnation, is not the optimum solution to my exact kind of problem...
Right. In any case, while the bi-directional-link object I was proposing back then would have worked, it's a heck of a lot more complex that the GUID+ZCatalog approach that I was suggesting you use. The ZCatalog just needs to be able to answer two types of questions for this to work: - what object does this id point to? - what objects (for a given relationship) are pointing at this id ? And both can be answered using standard indexes along with a GUIDs. It is however, important to realize that in this approach, the referring object contains all the information regarding the relationship, the referred object just infers the relationship info using the Zcatalog. HTH, Michael Bernstein.
Hi Folks, On Sunday, August 19, 2001, at 01:06 AM, Michael R. Bernstein wrote:
On 19 Aug 2001 00:28:34 -0400, Gary & Karyn wrote:
Michael and Robert--thank you very much. The global id looks like an excellent avenue to explore. With this or the ZPatterns approach, I am much relieved: what I need should fit within Zope well enough after all, thanks to your ideas.
With ZPatterns, you're basically only getting separation of application logic from storage, with some extra neat things like skinscripts that make application logic easier.
If you want the retreival of meta-data without waking up the object, you still need to layer on the global unique id with ZCatalog on top of ZPatterns.
Hmm... one nice thing about ZPatterns is that you could control what is stored where, and when it's looked up, or "woken up" at the SkinScript level so the application doesn't have to muss with the details (and you can change your mind later about the details with no impact on the application!). This, to my way of thinking, is a huge advantage! -steve
On 19 Aug 2001 10:00:49 -0500, Steve Spicklemire wrote:
Hi Folks,
On Sunday, August 19, 2001, at 01:06 AM, Michael R. Bernstein wrote:
If you want the retreival of meta-data without waking up the object, you still need to layer on the global unique id with ZCatalog on top of ZPatterns.
Hmm... one nice thing about ZPatterns is that you could control what is stored where, and when it's looked up, or "woken up" at the SkinScript level so the application doesn't have to muss with the details (and you can change your mind later about the details with no impact on the application!). This, to my way of thinking, is a huge advantage!
You're right, of course. I hadn't used ZPatterns in a while, and wasn't aware of the per transaction caching of data (which basically means that within a transaction an object only needs to be woken up once to get the neccessary data), but does ZPatterns currently have a built in way for doing persistent caching of meta-data? If not, then the easiest way to get this would be to layer the ZCatalog approach on top of ZPatterns, and using SkinScripts to hide the source of the referred object's info. SkinScripts do allow you to hide the implementation of all this caching, along with the storage details, from the application. This flexibility is very important for making it possible to re-deploy the application in new circumstances where your original assumptions for storage and caching are invalidated. In other words, ZPatterns is a framework building framework (a meta-framework) and seems to be best suited for building application frameworks that are flexible and future-proof, rather than building 'mere' applications. Cheers, Michael.
----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com>
On 19 Aug 2001 10:00:49 -0500, Steve Spicklemire wrote: <snip ZPatterns conversation>
oof. ZPatterns does daunt me, I admit, and perhaps it is because of the aspect oriented and "meta-framework" perspective. I can't help but admire the ideas and planning behind it, though, and look forward to seeing if I am up to it intellectually at this point, and further to seeing if it is of practical help for me. Thank you for your "transactional caching" explanation, Michael. Not knowing the full term, I was putting together the word parts to create an inaccurate definition. It sounds useful as a general optimization, but in my case, as you say, something that would be used in addition to, not instead of, a Zcatalog-like persistent cache. Steve, does that sound right? The ZPatterns' racks, btw, seem to match a design decision I had already made for some other reasons, so I'm eager to explore further. I look forward to having the time later this week to dig in to it. Thanks Gary
Hi Gary, On Sunday, August 19, 2001, at 05:47 PM, Gary & Karyn wrote:
----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com>
On 19 Aug 2001 10:00:49 -0500, Steve Spicklemire wrote: <snip ZPatterns conversation>
oof. ZPatterns does daunt me, I admit, and perhaps it is because of the aspect oriented and "meta-framework" perspective. I can't help but admire the ideas and planning behind it, though, and look forward to seeing if I am up to it intellectually at this point, and further to seeing if it is of practical help for me.
Have you seen the ZPatterns examples in my area? I've gotten zero feedback on the last two, and they are the most interesting! They use "levers" to create SQL and SkinScript automagically based on the propertysheets of the DataSkins. It's kinda fun.
Thank you for your "transactional caching" explanation, Michael. Not knowing the full term, I was putting together the word parts to create an inaccurate definition. It sounds useful as a general optimization, but in my case, as you say, something that would be used in addition to, not instead of, a Zcatalog-like persistent cache. Steve, does that sound right?
The good news is, you can use skinscript to make the catalog a "provider" of listish attributes that only get accessed when you need them. You'll need to fill in the details of a "real" application for me to get much more specific, but the point is that your application code can just say: for object in thisObject.relations: object.DoSomething() and just accessing the "relations" attribute can fire a cataloq query, a SQL query, or an external method. The application doesn't really *care* how the relations are implemented, so long as there's a simple/well-defined interface to get them.
The ZPatterns' racks, btw, seem to match a design decision I had already made for some other reasons, so I'm eager to explore further. I look forward to having the time later this week to dig in to it.
Good! take care, -steve
Thanks Gary
Steve, Where would these be found? I'm a bit lazy and need to be told these things ;)
Have you seen the ZPatterns examples in my area? I've gotten zero feedback on the last two, and they are the most interesting! They use "levers" to create SQL and SkinScript automagically based on the propertysheets of the DataSkins. It's kinda fun.
Sorry.. I should've mentioned that: http://www.zope.org/Members/sspickle -steve Phil Harris wrote:
Steve,
Where would these be found?
I'm a bit lazy and need to be told these things ;)
Have you seen the ZPatterns examples in my area? I've gotten zero feedback on the last two, and they are the most interesting! They use "levers" to create SQL and SkinScript automagically based on the propertysheets of the DataSkins. It's kinda fun.
----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com> <snip quote from me about GID and ZPatterns>
With ZPatterns, you're basically only getting separation of application logic from storage, with some extra neat things like skinscripts that make application logic easier.
If you want the retreival of meta-data without waking up the object, you still need to layer on the global unique id with ZCatalog on top of ZPatterns.
Good to know. Thank you. In another email in this thread Steve Alexander did suggest that perhaps a variation on ZPatterns ids within a rack could be used as global id. In addition, I'll try to find some global id routines in C as Robert suggested. <snip quote from me about XLink> <snip more discussion about GID> I was thinking this morning, however, that it still is a small shame that the RID can't be used--or, perhaps more obviously, that Zope objects don't have a GID assigned automatically, and then this GID is *used* as the RID. The advantage of the RID logic is that, within the catalog, the code is just about as quick as possible: pass a RID into the catalog, and it looks it up in a btree and gives you the UID; pass a UID, it looks it up in a btree and gives you the RID; pass an RID for the metadata and it just looks it up in the metadata dictionary and gives it to you. If RID equaled GID that would be extremely efficient. Instead, the GID approach now still must iterate through all of the ZCatalog indexes to get an RID in all cases. Therefore, getting metadata is passing a GID, looping through all indexes (in an optimized way, I am aware, but still), finding the RID, and *then* looking it up in the metadata dictionary. At the very least, even if the loop through the other irrelevant indexes is so optimized that it adds insignificant time, you are doing two lookups (GID->RID->metadata) when you could have done one, presumably nearing at least a double in processing time? I claim no great knowledge in this area but I can't help but find it logical. Therefore, would this proposal be worth anything: Create a class of objects with an integer global id created on initialization. Subclass the catalog to create a catalog that (a) accepts only objects with the global id as a base, and (b) uses the global id as an rid. The gain in efficiency would be offset by a loss in future-compatibility, unless this seemed like such a reasonable answer to the very basic concerns you raised in your 1999 post that it could be turned into an actual product proposal. Just another idea, I suppose...
It is however, important to realize that in this approach, the referring object contains all the information regarding the relationship, the referred object just infers the relationship info using the Zcatalog.
Yes; actually, though I was thinking of making use of a mix-in class (i.e. I started coding such a quick basic mix-in a day last week) that would keep the info in both objects. Data integrity issues, I know...the forward link would be the "true" link and the back link would be the disposable (and read-only) cache, should I need to code a reindexing or a validity check of the back links at some point...seemed easy enough and, for my app, not unacceptably risky. At least worth prototyping.
HTH,
Michael Bernstein.
It did indeed. Thanks. Gary
On 19 Aug 2001 18:31:43 -0400, Gary & Karyn wrote:
----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com> <snip quote from me about GID and ZPatterns>
With ZPatterns, you're basically only getting separation of application logic from storage, with some extra neat things like skinscripts that make application logic easier.
If you want the retreival of meta-data without waking up the object, you still need to layer on the global unique id with ZCatalog on top of ZPatterns.
Good to know. Thank you. In another email in this thread Steve Alexander did suggest that perhaps a variation on ZPatterns ids within a rack could be used as global id. In addition, I'll try to find some global id routines in C as Robert suggested.
Creating an application specific UID isn't hard: just store the objects of a particular type in a known location, prefix the id with a type identifier (ie: composition), and maintain an int propertyn on the container that gets incremented on each object creation. The objects then get an id of composition00000001 etc. deleting objects does *not* make their ids available, though. The obvious pitfall is when you ned to distribute the object storage and creation to more than one location per type of object, which is when you need to investigate those GUID routines. In other words approaches similar to ZPatterns Racks don't need a true GUID, but scattering objects throughout the Zope object tree (and on mounted storages) DOES.
[snip discussion of RIDs in the catalog] Therefore, would this proposal be worth anything: Create a class of objects with an integer global id created on initialization. Subclass the catalog to create a catalog that (a) accepts only objects with the global id as a base, and (b) uses the global id as an rid.
The problem here is that Zope allows an object's id to be changed via renaming. Is your proposal going to use the object's id as the GUID, or is the GUID supposed to be a second property (as I was suggesting for the indexable ID property)?
It is however, important to realize that in this approach, the referring object contains all the information regarding the relationship, the referred object just infers the relationship info using the Zcatalog.
Yes; actually, though I was thinking of making use of a mix-in class (i.e. I started coding such a quick basic mix-in a day last week) that would keep the info in both objects. Data integrity issues, I know...the forward link would be the "true" link and the back link would be the disposable (and read-only) cache, should I need to code a reindexing or a validity check of the back links at some point...seemed easy enough and, for my app, not unacceptably risky. At least worth prototyping.
Hmm. If your mix-in class created a general purpose ZMI tab for managing these relationships (and an associated API), I would be *very* interested in this, even if it didn't allow traversal into the associated objects (which can be accomplished using SkinScripts). Michael Bernstein.
----- Original Message ----- From: "Michael R. Bernstein" <webmaven@lvcm.com>
Good to know. Thank you. In another email in this thread Steve Alexander did suggest that perhaps a variation on ZPatterns ids within a rack could be used as global id. In addition, I'll try to find some global id routines in C as Robert suggested.
Creating an application specific UID isn't hard: just store the objects of a particular type in a known location, prefix the id with a type identifier (ie: composition), and maintain an int propertyn on the container that gets incremented on each object creation. The objects then get an id of composition00000001 etc. deleting objects does *not* make their ids available, though.
Right. Thread-safety is an issue, though, I think, for the incrementing, but an issue that has been already solved (yay, no wheel re-inventing).
The obvious pitfall is when you ned to distribute the object storage and creation to more than one location per type of object, which is when you need to investigate those GUID routines. In other words approaches similar to ZPatterns Racks don't need a true GUID, but scattering objects throughout the Zope object tree (and on mounted storages) DOES.
Agreed.
[snip discussion of RIDs in the catalog] Therefore, would this proposal be worth anything: Create a class of objects with an integer global id created on initialization. Subclass the catalog to create a catalog that (a) accepts only objects with the global id as a base, and (b) uses the global id as an rid.
The problem here is that Zope allows an object's id to be changed via renaming. Is your proposal going to use the object's id as the GUID, or is the GUID supposed to be a second property (as I was suggesting for the indexable ID property)?
A second property, as you suggested earlier. Read-only. Although in a perfect world the "read only" aspect would merely be indicated in the interface, I seem to recall some Python semi-internals (some double underline tricks) that can intercept gets and sets even if they are attempted directly on the property without an intermediary accessor function. Be that as it may, yeah, a second property is what I had in mind. I think it might work, with the caveat I mentioned that I'm possibly setting myself up for trouble in the future by not working with the ZCatalog as intended. There's no interface, but it still seems pretty clear that I'm going beyond the bounds of what was intended. I think it's potentially worthwhile though, and would be willing to go to bat with ZC to see if they like the idea if it works for me in prototype and some other folks like it ok. Allowing it in the ZCatalog interface would not be a big deal, I think--but my perspective is pretty darn limited so I could easily be wrong.
It is however, important to realize that in this approach, the referring object contains all the information regarding the relationship, the referred object just infers the relationship info using the Zcatalog.
Yes; actually, though I was thinking of making use of a mix-in class (i.e. I started coding such a quick basic mix-in a day last week) that would keep the info in both objects. Data integrity issues, I know...the forward link would be the "true" link and the back link would be the disposable (and read-only) cache, should I need to code a reindexing or a validity check of the back links at some point...seemed easy enough and, for my app, not unacceptably risky. At least worth prototyping.
Hmm. If your mix-in class created a general purpose ZMI tab for managing these relationships (and an associated API), I would be *very* interested in this, even if it didn't allow traversal into the associated objects (which can be accomplished using SkinScripts).
Sounds good. :-) I'll see what I can do. :-) It requires...well...it requires "efficient and flexible object references". With all this discussion I have some good ideas on how to move forward. Thanks again. Gary
I duno if I am totally of topic, why don't you just create your own unique ID that is stored with each object. Microsoft uses its famed GUID which they guarantee to be unique in this universe. You will find algorithms how to create them (in C or C++) publicly available. Robert ----- Original Message ----- From: "Gary & Karyn" <garykaryn@earthlink.net> To: <zope-dev@zope.org> Cc: <tdickenson@geminidataloggers.com>; "Dieter Maurer" <dieter@handshake.de>; "Casey Duncan" <cduncan@kaivo.com> Sent: Saturday, August 18, 2001 5:50 AM Subject: [Zope-dev] Efficient and flexible object references
[Originally posted on the main Zope list. Moving it here on recommendation for a deeper discussion.]
Hello.
I am building a Zope application (described more in full at the end of the message, eliminated here for brevity) that needs to make a large number of object references from other objects. * These references need to be as efficient as possible in terms of size and speed; * moreover, some caching of referenced objects' information, much like Catalog metadata, will be necessary. Hopefully this situation is generic enough that it will be of some interest.
I had some ideas on how to do this, and some folks on the Zope list added significantly to my list of both ideas and concerns. The following is a listing of the approaches discussed so far, and their advantages and problems. The Catalog RID idea is dead in the water but tantalizingly close to what I need, so I have included it.
Any contributions to further this list in any way would be greatly appreciated. I tried to put attributions in as clearly as possible. Thank you very much to all who have contributed so far.
* tree-based Zope URL, or uid ADVANTAGES: The canonical high-level reference approach, apparently. Stable, workable. DISADVANTAGES: Inefficient in storage space No metadata-type information: either wake the object from the ZODB and ask
it, or store what you want yourself (very difficult to keep fresh; it is rebuilding the ZCatalog metadata approach from another angle so feels like inventing the wheel) [speed issues in retrieving the object? must already be heavily optimized...]
* actual standard object references [From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk> An alternative that has not been mentioned so far is storing a real object reference.] ADVANTAGES: efficient in storage space DISADVANTAGES: (non-standard Zope approach) No metadata; must wake the object to get the information From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk>
This may well be easier if you can live without managing your relationships as if they were folders, and without using Zopes security mechanisms to control accees to the referred-to objects. Therefore, referenced objects would either not be accessible and manageable via Zope (losing much of Zope's advantage as an object publisher), or they would be in a scary netherworld--in a Zope management tree but also directly referenced outside of it--in this case a hack that would probably cause significant problems. I think Toby meant the first scenario. My app would have to use the second scenario (the objects should be published), which is why I had not pursued it.
* Catalog (ZCatalog) rids (currently DEAD IN THE WATER but tantalizingly close to what I want) ADVANTAGES: Efficient use of space Can return metadata without waking actual object (a design goal of the ZCatalog, I believe) Metadata is updated whenever the catalog is updated (hopefully in approximate real time)--no new mechanism needed to keep metadata fresh (we're using a wheel that has already been invented) All of the needed methods except hasuid (i.e. getRID) are part of what I assume is the interface: getobject, getMetadataForRID, getIndexDataForRID, getpath DISADVANTAGES: From: "Dieter Maurer" <dieter@handshake.de>
"rid"s are not persistently associated with objects. If someone calls "manage_catalogReindex", then all your rids change. [that's the killer] Also, a hack, unless it were blessed by ZC at some point, because it relies on inner workings of the Catalog (i.e., the ZCatalog has no method to return an object's rid, and the catalog itself only has hasuid, which as named does not imply reliability as a way of getting RID)
* ZODB oid [From: "Casey Duncan" <cduncan@kaivo.com>
Another option to explore might be to store the oid] ADVANTAGES: efficient use of space DISADVANTAGES: From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk> OIDs are only unique within a single storage.
* If some objects are exported and reimported, their OIDs will change.
* You will get duplicate OIDs in the same Zope if you are using a mounted storage. From: "Dieter Maurer" <dieter@handshake.de> * they are a very low level feature, difficult to access from most Zope parts
* they may not be unique Think of "mountable storages". Then each storage will have its own OID's, interpreted in its own local context.
---------- As the list above stands now, I will either use a standard tree-based URL uid and try to build my own optimized caching mechanism (ouch) or go down the treacherous and very dangerous path (for the future) of attempting to make the ZCatalog do what I want no matter what. I suppose my first step there would be to see if I can find a way to allow reindexing of a catalog without changing RIDs... I can feel the frowns from here...
Thanks for your ideas. For those of you still here, a few paragraphs describing my project more fully follow. Only read them if you feel they will help your brainstorming. :-)
Thanks
Gary
----------------------------------------
CASUAL PROJECT DESCRIPTION from an earlier post
While I keep an eye to contributing back to the community by making my solutions as flexible as possible, I'm putting a super-bibliography for musicians, especially vocalists, into Zope. It stores objects describing compositions, books, texts, recordings, publications, people, topics, and other items.
On a simple level, I need the kind of referencing I describe for connecting people objects as creators to other objects; for connecting any object to another (particularly topics) in a "describes" relationship; for connecting same-class objects in a parent-child relationship; and other similar tasks. (Obviously, I'm coming from a bit of a RDBM background on this but I'm enjoying the better modeling possible with the ZODB, among other things.)
When a composition object displays, for instance, it needs to both know the name and address of all of it's creators, ideally without waking up the creators yet. Similarly, a person needs to know back links--what objects claim me as a creator? Rather than caching a page or an object, I have decided it will be best to cache the relationships and metadata somehow.
----------------- More details for the super-interested -------------------
The modelling for compositions is particularly complex, at least to me, since I include instruments needed, if any, and voices needed, if any; the voices themselves have high and low range extremes I am keeping track of, and even multiple options for those. If they are published, each song might be transposed by a given number of half steps (producing a new set of the high and low extremes for the composition). If the composition's parent is published and transposed, that means that will produce yet another set of high and low extremes.
Displaying and searching by range extremes thus becomes quite complex, and a high, high candidate for caching. Even so, expecting my code to keep the cached information fresh when the relationships are so far-flung makes me nervous: I think I'll only be able to cache so far down the chain, and rely on live checks (or at least secondary cached metadata checks) for the rest.
I'm figuring I'm going to need a new pluggable index, based on the work in PathIndex, for the complicated range searches and some other needs; an interlinking class that manages inter-object back and forward links behind the scenes for caching and getting the cached metadata I described; and some simple subclasses that will represent each of the data types. I have plans from there as well, but those are first steps.
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Gary & Karyn wrote:
* moreover, some caching of referenced objects' information, much like Catalog metadata, will be necessary.
Do you mean keeping a persistent cache of this data along with your objects? You could consider a per-transaction cache of data, as used in ZPatterns, so that the data gleaned from related objects is only calculated once per transaction, but, you don't need to bother with cache invalidation messages, as the data is discarded at transaction boundaries. Whether you want persistent cacheing or per-transaction cacheing depends on the ratio of writes and updates to reads in your application. Obviously, if you are doing a lot of persistent caching, you'll need to do a lot of invalidating on each change. A third option is to use some sort of in-memory cache that doesn't rely on _v_ attributes. The RAM Cache from the StandardCacheManagers product does this by storing the cached data in a dictionary that is in an attribute of a module. If you go down this path, you'll need to be careful about read/write locking of your cache. As to where to store your objects, and how to relate them to each other: do you need to store the objects in particular places in your ZODB? If not, you could use the ZPatterns approach of storing all objects of a particular type in a Rack. Thus, you'd have racks for Creators, for Instruments, for Compositions, and so forth. A Rack is a bit like a database table. Within a Rack, each object has a string id. This is because when a Rack stores its data persistently, it uses an OO BTree. When you store a reference to object B from object A, if the type of reference is fixed in your application, you need only store the string id of object B, as the rack to use will be evident from the type of reference. Thus, if object A is a Composition, and object B is a Creator, then object A can have an attribute my_creator which has the value "uid000023184982". Object A will know that my_creator is to be looked up in the Creators Rack. I've been playing with the idea of a new kind of Rack that uses an IO BTree to store its objects. You'll be able to access the objects with an integer id, or as normal by string id, using a standard convention that applies to every object in the Rack. An example of such a convention is "creator-%08d" % id where an object with id 12 would also have the string id "creator-00000012". Another advantage of the Rack is that it makes it easy to convert your application to use an RDBMS at some later point. If you want to persue this Rack stuff further, you'll also want to know about Specialists. When you know about Specialists, you'll want to know about SkinScript. Also, this thread would best be moved to the zpatterns list: zpatterns@eby-sarna.com. Take a look at http://eby-sarna.com for signup instructions. -- Steve Alexander Software Engineer Cat-Box limited
----- Original Message ----- From: "Steve Alexander" <steve@cat-box.net> <snip>
You could consider a per-transaction cache of data, as used in ZPatterns, <...snip>
Yes, you are right: I really need to look at ZPatterns again. It was a bit beyond me last time I checked it out but I have a better handle on Zope basics now so just looking at few pages today on the ZPatterns Wiki looked significantly less daunting. The transactional cache, the rack, and the ids look like a great match for my needs, on a superficial quick look. Thank you for your ideas. I'll pursue them.
I've been playing with the idea of a new kind of Rack that uses an IO BTree to store its objects. You'll be able to access the objects with an integer id, or as normal by string id, using a standard convention that applies to every object in the Rack. An example of such a convention is "creator-%08d" % id where an object with id 12 would also have the string id "creator-00000012".
This sounds like an excellent idea. It could be a unique global id within the whole Zope, and simultaneously a fast integer id within the rack. I would love to know if you do anything more with this idea.
Also, this thread would best be moved to the zpatterns list: zpatterns@eby-sarna.com. Take a look at http://eby-sarna.com for signup instructions.
Thanks again. I signed up on the list on your recommendation; I think I want to read the Wiki a bit and look at some of the ZPattern code before I participate, but then I'll pursue there as well. Thanks! Gary
I am building a Zope application (described more in full at the end of the message, eliminated here for brevity) that needs to make a large number of object references from other objects.
Hmmm, I'm in the protracted process of building a portal_discussion tool that stores a tree of lightweight objects that map to CMF content. Kindof an equivalent to ZCatalog but in a tree paradigm rather than a table paradigm. If this might be what you're after, check the Swishdot module out of the Squishdot CVS on SourceForge and have a look at DiscussionTool.py. It doesn't currently work but should give you a rough idea of what planet I was on. cheers, Chris
----- Original Message ----- From: "Chris Withers" <chrisw@nipltd.com> <snip>
If this might be what you're after, check the Swishdot module out of the Squishdot CVS on SourceForge and have a look at DiscussionTool.py. It doesn't currently work but should give you a rough idea of what planet I was on.
Thank you, I will. :-) Gary
Hi! some ideas ...
As the list above stands now, I will either use a standard tree-based URL uid and try to build my own optimized caching mechanism (ouch) or go down the treacherous and very dangerous path (for the future) of attempting to make the ZCatalog do what I want no matter what. I suppose my first step there would be to see if I can find a way to allow reindexing of a catalog without changing RIDs... I can feel the frowns from here...
With the catalog approach, how exactly are planning to do it? If I get it right, you store the RID of the related object (let's say the Composer) with your object (let's say the composition) and if you need the information, you get the related data from the catalog. You do this because you don't want to reference the object itself. It would have to be woken up to get the metadata (Name, date of birth whatever) from it. One thing that comes to my mind is cataloging a unique id (or the physical path, which is the standard id in the catalog anyway) with the entry and use that one as the key. I don't know if it is much slower than retrieving the catalog entry by its RID, and your custom id would not change when the object is re-cataloged.
On a simple level, I need the kind of referencing I describe for connecting people objects as creators to other objects; for connecting any object to another (particularly topics) in a "describes" relationship; for connecting same-class objects in a parent-child relationship; and other similar tasks. (Obviously, I'm coming from a bit of a RDBM background on this but I'm enjoying the better modeling possible with the ZODB, among other things.)
When a composition object displays, for instance, it needs to both know the name and address of all of it's creators, ideally without waking up the creators yet. Similarly, a person needs to know back links--what objects claim me as a creator? Rather than caching a page or an object, I have decided it will be best to cache the relationships and metadata somehow.
All that sounds very much like you maybe need an RDBM. After all, you have a lot of relations! If you still want to have "real" objects in Zope instead of tabular data, you might use the DBObjects framework. I have no benchmarks, but I guess retrieving your data from PostgreSQL should be quite as efficient as using the catalog. With DBObjects, you normally have real persistent objects that have their data stored in PostgreSQL but can be used from Zope like a ZODB object. The good thing in your case is that you could then write your own methods like "getMyComposers()" that do the hard work in SQL and return the results as attributes of the object. I am not really sure what approach is the most efficient, and whether you really need massive caching or just a very efficient just-in-time retrieval and then maybe late caching of the rendered pages using the available caching tools. I guess you'll need to test that with actual data ... Cheers Joachim
----- Original Message ----- From: "Joachim Werner" <joe@iuveno-net.de> To: "Gary & Karyn" <garykaryn@earthlink.net>; <zope-dev@zope.org> Sent: Saturday, August 18, 2001 1:52 PM Subject: Re: [Zope-dev] Efficient and flexible object references
Hi!
some ideas ...
Thank you!
With the catalog approach, how exactly are planning to do it? If I get it right, you store the RID of the related object (let's say the Composer) with your object (let's say the composition) and if you need the information, you get the related data from the catalog. You do this because you don't want to reference the object itself. It would have to be woken up to get the metadata (Name, date of birth whatever) from it.
Yes, that's it exactly.
One thing that comes to my mind is cataloging a unique id (or the physical path, which is the standard id in the catalog anyway) with the entry and use that one as the key. I don't know if it is much slower than retrieving the catalog entry by its RID, and your custom id would not change when the object is re-cataloged.
Yes, that sounds like a good idea. I'll try it and the ZPatterns idea I think. <snip my description>
All that sounds very much like you maybe need an RDBM. After all, you have a lot of relations! If you still want to have "real" objects in Zope instead of tabular data, you might use the DBObjects framework. I have no benchmarks, but I guess retrieving your data from PostgreSQL should be quite as efficient as using the catalog. With DBObjects, you normally have real persistent objects that have their data stored in PostgreSQL but can be used from Zope like a ZODB object. The good thing in your case is that you could then write your own methods like "getMyComposers()" that do the hard work in SQL and return the results as attributes of the object.
Yes, true, I have a lot of relations. :-) I'll check out the DBObjects framework but I'm actively trying to move away from an RDBM approach for this project. My incomplete ER diagram for this looked like spaghetti, and I kept on finding that I wanted to make my entities objects...this is a bit of an exploration of ODBs in general to see if I can make a simpler, more graceful design model the same information. If all else fails, I'll pull the ER diagram out again. :-)
I am not really sure what approach is the most efficient, and whether you really need massive caching or just a very efficient just-in-time retrieval and then maybe late caching of the rendered pages using the available caching tools. I guess you'll need to test that with actual data ...
Understood, and agreed; while I don't want to cache whole pages for flexible display reasons, perhaps I could cache rendered "baked" page components...metadata caching seems easier to keep fresh though...I don't have unlimited time, of course, so I hope to land on a reasonable solution out of the first two or three I try. I'll try to remember this option when I plan my prototype building. Thank you very much! Gary
participants (8)
-
Chris Withers -
Gary & Karyn -
Joachim Werner -
Michael R. Bernstein -
Phil Harris -
Robert Rottermann -
Steve Alexander -
Steve Spicklemire