[Originally posted on the main Zope list. Moving it here on recommendation for a deeper discussion.] Hello. I am building a Zope application (described more in full at the end of the message, eliminated here for brevity) that needs to make a large number of object references from other objects. * These references need to be as efficient as possible in terms of size and speed; * moreover, some caching of referenced objects' information, much like Catalog metadata, will be necessary. Hopefully this situation is generic enough that it will be of some interest. I had some ideas on how to do this, and some folks on the Zope list added significantly to my list of both ideas and concerns. The following is a listing of the approaches discussed so far, and their advantages and problems. The Catalog RID idea is dead in the water but tantalizingly close to what I need, so I have included it. Any contributions to further this list in any way would be greatly appreciated. I tried to put attributions in as clearly as possible. Thank you very much to all who have contributed so far. * tree-based Zope URL, or uid ADVANTAGES: The canonical high-level reference approach, apparently. Stable, workable. DISADVANTAGES: Inefficient in storage space No metadata-type information: either wake the object from the ZODB and ask it, or store what you want yourself (very difficult to keep fresh; it is rebuilding the ZCatalog metadata approach from another angle so feels like inventing the wheel) [speed issues in retrieving the object? must already be heavily optimized...] * actual standard object references [From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk> An alternative that has not been mentioned so far is storing a real object reference.] ADVANTAGES: efficient in storage space DISADVANTAGES: (non-standard Zope approach) No metadata; must wake the object to get the information From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk>
This may well be easier if you can live without managing your relationships as if they were folders, and without using Zopes security mechanisms to control accees to the referred-to objects. Therefore, referenced objects would either not be accessible and manageable via Zope (losing much of Zope's advantage as an object publisher), or they would be in a scary netherworld--in a Zope management tree but also directly referenced outside of it--in this case a hack that would probably cause significant problems. I think Toby meant the first scenario. My app would have to use the second scenario (the objects should be published), which is why I had not pursued it.
* Catalog (ZCatalog) rids (currently DEAD IN THE WATER but tantalizingly close to what I want) ADVANTAGES: Efficient use of space Can return metadata without waking actual object (a design goal of the ZCatalog, I believe) Metadata is updated whenever the catalog is updated (hopefully in approximate real time)--no new mechanism needed to keep metadata fresh (we're using a wheel that has already been invented) All of the needed methods except hasuid (i.e. getRID) are part of what I assume is the interface: getobject, getMetadataForRID, getIndexDataForRID, getpath DISADVANTAGES: From: "Dieter Maurer" <dieter@handshake.de>
"rid"s are not persistently associated with objects. If someone calls "manage_catalogReindex", then all your rids change. [that's the killer] Also, a hack, unless it were blessed by ZC at some point, because it relies on inner workings of the Catalog (i.e., the ZCatalog has no method to return an object's rid, and the catalog itself only has hasuid, which as named does not imply reliability as a way of getting RID)
* ZODB oid [From: "Casey Duncan" <cduncan@kaivo.com>
Another option to explore might be to store the oid] ADVANTAGES: efficient use of space DISADVANTAGES: From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk> OIDs are only unique within a single storage.
* If some objects are exported and reimported, their OIDs will change.
* You will get duplicate OIDs in the same Zope if you are using a mounted storage. From: "Dieter Maurer" <dieter@handshake.de> * they are a very low level feature, difficult to access from most Zope parts
* they may not be unique Think of "mountable storages". Then each storage will have its own OID's, interpreted in its own local context.
---------- As the list above stands now, I will either use a standard tree-based URL uid and try to build my own optimized caching mechanism (ouch) or go down the treacherous and very dangerous path (for the future) of attempting to make the ZCatalog do what I want no matter what. I suppose my first step there would be to see if I can find a way to allow reindexing of a catalog without changing RIDs... I can feel the frowns from here... Thanks for your ideas. For those of you still here, a few paragraphs describing my project more fully follow. Only read them if you feel they will help your brainstorming. :-) Thanks Gary ---------------------------------------- CASUAL PROJECT DESCRIPTION from an earlier post While I keep an eye to contributing back to the community by making my solutions as flexible as possible, I'm putting a super-bibliography for musicians, especially vocalists, into Zope. It stores objects describing compositions, books, texts, recordings, publications, people, topics, and other items. On a simple level, I need the kind of referencing I describe for connecting people objects as creators to other objects; for connecting any object to another (particularly topics) in a "describes" relationship; for connecting same-class objects in a parent-child relationship; and other similar tasks. (Obviously, I'm coming from a bit of a RDBM background on this but I'm enjoying the better modeling possible with the ZODB, among other things.) When a composition object displays, for instance, it needs to both know the name and address of all of it's creators, ideally without waking up the creators yet. Similarly, a person needs to know back links--what objects claim me as a creator? Rather than caching a page or an object, I have decided it will be best to cache the relationships and metadata somehow. ----------------- More details for the super-interested ------------------- The modelling for compositions is particularly complex, at least to me, since I include instruments needed, if any, and voices needed, if any; the voices themselves have high and low range extremes I am keeping track of, and even multiple options for those. If they are published, each song might be transposed by a given number of half steps (producing a new set of the high and low extremes for the composition). If the composition's parent is published and transposed, that means that will produce yet another set of high and low extremes. Displaying and searching by range extremes thus becomes quite complex, and a high, high candidate for caching. Even so, expecting my code to keep the cached information fresh when the relationships are so far-flung makes me nervous: I think I'll only be able to cache so far down the chain, and rely on live checks (or at least secondary cached metadata checks) for the rest. I'm figuring I'm going to need a new pluggable index, based on the work in PathIndex, for the complicated range searches and some other needs; an interlinking class that manages inter-object back and forward links behind the scenes for caching and getting the cached metadata I described; and some simple subclasses that will represent each of the data types. I have plans from there as well, but those are first steps.