[Zope-dev] Efficient and flexible object references

Fri, 17 Aug 2001 23:50:53 -0400

[Originally posted on the main Zope list.  Moving it here on recommendation
for a deeper discussion.]

Hello.

I am building a Zope application (described more in full at the end of the
message, eliminated here for brevity) that needs to make a large number of
object references from other objects.
    * These references need to be as efficient as possible in terms of size
and speed;
    * moreover, some caching of referenced objects' information, much like
Catalog metadata, will be necessary.
Hopefully this situation is generic enough that it will be of some interest.

I had some ideas on how to do this, and some folks on the Zope list added
significantly to my list of both ideas and concerns.  The following is a
listing of the approaches discussed so far, and their advantages and
problems.  The Catalog RID idea is dead in the water but tantalizingly close
to what I need, so I have included it.

Any contributions to further this list in any way would be greatly
appreciated.  I tried to put attributions in as clearly as possible.  Thank
you very much to all who have contributed so far.

* tree-based Zope URL, or uid
ADVANTAGES:
The canonical high-level reference approach, apparently.  Stable, workable.
DISADVANTAGES:
Inefficient in storage space
No metadata-type information: either wake the object from the ZODB and ask
it, or store what you want yourself (very difficult to keep fresh; it is
rebuilding the ZCatalog metadata approach from another angle so feels like
inventing the wheel)
[speed issues in retrieving the object?  must already be heavily
optimized...]

* actual standard object references
[From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk>
An alternative that has not been mentioned so far is storing a real
object reference.]
ADVANTAGES:
efficient in storage space
DISADVANTAGES:
(non-standard Zope approach)
No metadata; must wake the object to get the information
From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk>
> This may well be easier if you can live without managing your
> relationships as if they were folders, and without using Zopes
> security mechanisms to control accees to the referred-to objects.
Therefore, referenced objects would either not be accessible and manageable
via Zope (losing much of Zope's advantage as an object publisher), or they
would be in a scary netherworld--in a Zope management tree but also directly
referenced outside of it--in this case a hack that would probably cause
significant problems.  I think Toby meant the first scenario.  My app would
have to use the second scenario (the objects should be published), which is
why I had not pursued it.

* Catalog (ZCatalog) rids (currently DEAD IN THE WATER but tantalizingly
close to what I want)
ADVANTAGES:
Efficient use of space
Can return metadata without waking actual object (a design goal of the
ZCatalog, I believe)
Metadata is updated whenever the catalog is updated (hopefully in
approximate real time)--no new mechanism needed to keep metadata fresh
(we're using a wheel that has already been invented)
All of the needed methods except hasuid (i.e. getRID) are part of what I
assume is the interface: getobject, getMetadataForRID, getIndexDataForRID,
getpath
DISADVANTAGES:
From: "Dieter Maurer" <dieter@handshake.de>
> "rid"s are not persistently associated with objects.
> If someone calls "manage_catalogReindex", then all your rids change.
[that's the killer]
Also, a hack, unless it were blessed by ZC at some point, because it relies
on inner workings of the Catalog (i.e., the ZCatalog has no method to return
an object's rid, and the catalog itself only has hasuid, which as named does
not imply reliability as a way of getting RID)

* ZODB oid
[From: "Casey Duncan" <cduncan@kaivo.com>
> Another option to explore might be to store the oid]
ADVANTAGES:
efficient use of space
DISADVANTAGES:
From: "Toby Dickenson" <tdickenson@devmail.geminidataloggers.co.uk>
> OIDs are only unique within a single storage.
>
> * If some objects are exported and reimported, their OIDs
>  will change.
>
> * You will get duplicate OIDs in the same Zope if you are
>  using a mounted storage.
From: "Dieter Maurer" <dieter@handshake.de>
>   *  they are a very low level feature, difficult to
>      access from most Zope parts
>
>   *  they may not be unique
>      Think of "mountable storages".
>      Then each storage will have its own OID's, interpreted
>      in its own local context.

----------
As the list above stands now, I will either use a standard tree-based URL
uid and try to build my own optimized caching mechanism (ouch) or go down
the treacherous and very dangerous path (for the future) of attempting to
make the ZCatalog do what I want no matter what.  I suppose my first step
there would be to see if I can find a way to allow reindexing of a catalog
without changing RIDs...  I can feel the frowns from here...

Thanks for your ideas.  For those of you still here, a few paragraphs
describing my project more fully follow.  Only read them if you feel they
will help your brainstorming. :-)

Thanks

Gary

----------------------------------------

CASUAL PROJECT DESCRIPTION from an earlier post

While I keep an eye to contributing back to the community by making my
solutions as flexible as possible, I'm putting a super-bibliography for
musicians, especially vocalists, into Zope.  It stores objects describing
compositions, books, texts, recordings, publications, people, topics, and
other items.

On a simple level, I need the kind of referencing I describe for connecting
people objects as creators to other objects; for connecting any object to
another (particularly topics) in a "describes" relationship; for connecting
same-class objects in a parent-child relationship; and other similar tasks.
(Obviously, I'm coming from a bit of a RDBM background on this but I'm
enjoying the better modeling possible with the ZODB, among other things.)

When a composition object displays, for instance, it needs to both know the
name and address of all of it's creators, ideally without waking up the
creators yet.  Similarly, a person needs to know back links--what objects
claim me as a creator?  Rather than caching a page or an object, I have
decided it will be best to cache the relationships and metadata somehow.

----------------- More details for the super-interested -------------------

The modelling for compositions is particularly complex, at least to me,
since I include instruments needed, if any, and voices needed, if any; the
voices themselves have high and low range extremes I am keeping track of,
and even multiple options for those.  If they are published, each song might
be transposed by a given number of half steps (producing a new set of the
high and low extremes for the composition).  If the composition's parent is
published and transposed, that means that will produce yet another set of
high and low extremes.

Displaying and searching by range extremes thus becomes quite complex, and a
high, high candidate for caching.  Even so, expecting my code to keep the
cached information fresh when the relationships are so far-flung makes me
nervous: I think I'll only be able to cache so far down the chain, and rely
on live checks (or at least secondary cached metadata checks) for the rest.

I'm figuring I'm going to need a new pluggable index, based on the work in
PathIndex, for the complicated range searches and some other needs; an
interlinking class that manages inter-object back and forward links behind
the scenes for caching and getting the cached metadata I described; and some
simple subclasses that will represent each of the data types.  I have plans
from there as well, but those are first steps.