This is mostly a question for AJ, but any input would be great. This bug bit me today and is documented here: http://collector.zope.org/Zope/449/ISSUE_TRANSCRIPT/view I dont understand the brief argument against this one, it would make sense to me to able to pull an object out of the catalog based on its path. For example if I want /foo/bar/blammo, currently this means there is only one way of pulling the an object of the catalog given this path. Thats to send (path='/foo/bar', id='blammo'), rather than (path='/foo/bar/blammo'). Why wouldnt we want it this way? One thing I have done is store a whole bunch of references to objects as selected by the user. These are essentially random objects and the quickest way is to pull them back out of the catalog. Of course I cant do more than one object per query (unless Im missing some other way) Id love to do (path=['/foo/bar/blammo', '/foo/bar/blammoz']) and get these 2 objects... I think that would be neat. It would seem data_record_id_ is not guaranteed to permanent after a reindex_object (which CatalogAwareness uses), since this uncatalog and then recatalogs the object. If this did work it would be cool and I could undo all the changes to my app back again. - The patch is already there, so Im curious why do we have what seems to be a more limited design? - Would a halfway option such as path_match='final' be a choice that wont break any code but would confuse everyone and not make into the documentation? - Is it just a matter of fixing reindex_object as was suggested on #zope so that data_record_id_ is more permanent? Cheers -- Andy McKay Agmweb Consulting http://www.agmweb.ca
A PathIndex is designed to make it more efficient to aggregate objects at various levels of containment. Their primary use case AFAIK is to allow to to limit queries to particular places within a hierarchy. The idea is to eliminate recursive searching of leaf level folders when you want all objects under a higher level and its child levels. Also, by not indexing the nodes themselves, the index is an order of magnitude smaller and searches are therefore faster and it takes less room and is faster to update. In fact there is no need to index the entire path of an object in the catalog. Even with no Indexes defined, ZCatalog already does this for you. The uid of every entry in the catalog is the full path to the object (as a string). Unfortunately, ZCatalog does not expose this to the surface but you can write a trivial external method to do it. And I might entertain adding a ZCatalog API to do so if I had a good use case. Right now you can only access entries by RID. Now that begs the question, If you already know the path to the object you are looking for, why are you using the Catalog in the first place? I highly doubt doing what you describe below is faster than just directly accessing the object. In fact I'd be willing to be its slower, especially since you are searching two indexes to get it. Unless of course these are dynamically generated objects of some kind (no stored in Zope). As for making RIDs more permanent, that would basically require a rewrite of the Catalog, and make certain operations much more expensive. As it stands, your application should only assume that RIDs are valid within a single transaction. You should use the path to uniquely identify objects, or some application defined uid that gets cataloged otherwise. -Casey ----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: <zope-dev@zope.org> Sent: Saturday, August 17, 2002 6:22 PM Subject: [Zope-dev] PathIndex doesn't index last part of path
This is mostly a question for AJ, but any input would be great. This bug bit me today and is documented here: http://collector.zope.org/Zope/449/ISSUE_TRANSCRIPT/view
I dont understand the brief argument against this one, it would make sense to me to able to pull an object out of the catalog based on its path. For example if I want /foo/bar/blammo, currently this means there is only one way of pulling the an object of the catalog given this path. Thats to send (path='/foo/bar', id='blammo'), rather than (path='/foo/bar/blammo'). Why wouldnt we want it this way?
One thing I have done is store a whole bunch of references to objects as selected by the user. These are essentially random objects and the quickest way is to pull them back out of the catalog. Of course I cant do more than one object per query (unless Im missing some other way) Id love to do (path=['/foo/bar/blammo', '/foo/bar/blammoz']) and get these 2 objects... I think that would be neat.
It would seem data_record_id_ is not guaranteed to permanent after a reindex_object (which CatalogAwareness uses), since this uncatalog and then recatalogs the object. If this did work it would be cool and I could undo all the changes to my app back again.
- The patch is already there, so Im curious why do we have what seems to be a more limited design? - Would a halfway option such as path_match='final' be a choice that wont break any code but would confuse everyone and not make into the documentation? - Is it just a matter of fixing reindex_object as was suggested on #zope so that data_record_id_ is more permanent?
Cheers -- Andy McKay Agmweb Consulting http://www.agmweb.ca
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[snip] Hmm ok, I can see those reasons.
Unfortunately, ZCatalog does not expose this to the surface but you can write a trivial external method to do it. And I might entertain adding a ZCatalog API to do so if I had a good use case.
Ah... I think this might be the best idea, I'll add that in to mine and, see if anyone else wants it.
Now that begs the question, If you already know the path to the object you are looking for, why are you using the Catalog in the first place? I highly doubt doing what you describe below is faster than just directly accessing the object. In fact I'd be willing to be its slower, especially since you are searching two indexes to get it.
Okay so lets assume there is only index I need to search, the path index. Wouldn't it be faster to pull that out of the Catalog then do a traverse over to the sub object, wake up a bunch of objects to do that and get the object? It would be interesting to test that... perhaps Im just leaning on the old crutch that its faster to get stuff from the catalog than wake many objects up. Suppose I have 100 such objects. I would have thought one catalog query on one index (even though its a big union) would be faster than 100 traversals. Anyway since I cant efficiently go and get an individual object from the catalog, this is what Im doing now... -- Andy McKay Agmweb Consulting http://www.agmweb.ca
Given a method that you could return a catalog record given a path (which is an obj's ZCatalog uid), then this would be faster then traversal I think (probably a lot faster). So long as whatever you wanted could be gotten from metadata. If you call getObject, that actually does traversal anyway. Here is the code for an external method you could put in a ZCatalog to do this: def getRecordFromPath(self, path): """Get a catalog record using its path""" rid = self._catalog.uids[path] return self._catalog[rid] This same code could be put into a ZCatalog subclass or ZCatalog itself. It would be a very fast method since it is just two BTree key lookups and instantiating the brain class. hth, -Casey ----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: "Casey Duncan" <casey@zope.com>; <zope-dev@zope.org> Sent: Saturday, August 17, 2002 8:40 PM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path
[snip] Hmm ok, I can see those reasons.
Unfortunately, ZCatalog does not expose this to the surface but you can write a trivial external method to do it. And I might entertain adding a ZCatalog API to do so if I had a good use case.
Ah... I think this might be the best idea, I'll add that in to mine and, see if anyone else wants it.
Now that begs the question, If you already know the path to the object you are looking for, why are you using the Catalog in the first place? I highly doubt doing what you describe below is faster than just directly accessing the object. In fact I'd be willing to be its slower, especially since you are searching two indexes to get it.
Okay so lets assume there is only index I need to search, the path index. Wouldn't it be faster to pull that out of the Catalog then do a traverse over to the sub object, wake up a bunch of objects to do that and get the object? It would be interesting to test that... perhaps Im just leaning on the old crutch that its faster to get stuff from the catalog than wake many objects up. Suppose I have 100 such objects. I would have thought one catalog query on one index (even though its a big union) would be faster than 100 traversals.
Anyway since I cant efficiently go and get an individual object from the catalog, this is what Im doing now... -- Andy McKay Agmweb Consulting http://www.agmweb.ca
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
If you call getObject, that actually does traversal anyway.
Right, in that situation it would be pointless... You rock, Casey, thanks. I was thinking more about adding: def getMetadataFromPath(self, path): """ get metadata for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getMetadataForRID(rid) def getIndexFromPath(self, path): """ get index for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getIndexDataForRID(rid) Since this uses the same terminology and returns the same data as getIndexDataForRID and getMetadataDataForRID. Is there any reason why I couldn't checked these in? -- Andy McKay Agmweb Consulting http://www.agmweb.ca
Your code looks fine, I think it meshes better with the underlying catalog code too. I don't have a problem with this getting checked in, just make sure you update IZCatalog.py, help/Catalog.py and add unit tests (that pass ;^). -Casey ----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: "Casey Duncan" <casey@zope.com>; <zope-dev@zope.org> Sent: Sunday, August 18, 2002 2:54 AM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path
If you call getObject, that actually does traversal anyway.
Right, in that situation it would be pointless...
You rock, Casey, thanks. I was thinking more about adding:
def getMetadataFromPath(self, path): """ get metadata for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getMetadataForRID(rid)
def getIndexFromPath(self, path): """ get index for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getIndexDataForRID(rid)
Since this uses the same terminology and returns the same data as getIndexDataForRID and getMetadataDataForRID. Is there any reason why I couldn't checked these in? -- Andy McKay Agmweb Consulting http://www.agmweb.ca
Woohoo my first check in ;) -- Andy McKay Agmweb Consulting http://www.agmweb.ca ----- Original Message ----- From: "Casey Duncan" <casey@zope.com> To: "Andy McKay" <andy@agmweb.ca>; <zope-dev@zope.org> Sent: Saturday, August 17, 2002 12:14 PM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path
Your code looks fine, I think it meshes better with the underlying catalog code too. I don't have a problem with this getting checked in, just make sure you update IZCatalog.py, help/Catalog.py and add unit tests (that pass ;^).
-Casey
----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: "Casey Duncan" <casey@zope.com>; <zope-dev@zope.org> Sent: Sunday, August 18, 2002 2:54 AM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path
If you call getObject, that actually does traversal anyway.
Right, in that situation it would be pointless...
You rock, Casey, thanks. I was thinking more about adding:
def getMetadataFromPath(self, path): """ get metadata for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getMetadataForRID(rid)
def getIndexFromPath(self, path): """ get index for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getIndexDataForRID(rid)
Since this uses the same terminology and returns the same data as getIndexDataForRID and getMetadataDataForRID. Is there any reason why I couldn't checked these in? -- Andy McKay Agmweb Consulting http://www.agmweb.ca
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Hi, Sorry, maybe I'm too late ;) but I have a note here. uid in Cataloc can be path but it can be any arbitrary unigue identifier. In system I'm working with it is ID generated in the time of object creation and it persists and it is uid in Catalog. Nothing wrong with the methods themselves but about their name. As far as I understood they are getRecordFromPath, getMetadataFromPath, getIndexFromPath. Generally it will be path a uid of Catalolog, but naming in other cases will be confusing. Maybe it is just a question of documentation? Regards, m. Casey Duncan wrote:
Your code looks fine, I think it meshes better with the underlying catalog code too. I don't have a problem with this getting checked in, just make sure you update IZCatalog.py, help/Catalog.py and add unit tests (that pass ;^).
-Casey
----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: "Casey Duncan" <casey@zope.com>; <zope-dev@zope.org> Sent: Sunday, August 18, 2002 2:54 AM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path
If you call getObject, that actually does traversal anyway.
Right, in that situation it would be pointless...
You rock, Casey, thanks. I was thinking more about adding:
def getMetadataFromPath(self, path): """ get metadata for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getMetadataForRID(rid)
def getIndexFromPath(self, path): """ get index for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getIndexDataForRID(rid)
Since this uses the same terminology and returns the same data as getIndexDataForRID and getMetadataDataForRID. Is there any reason why I couldn't checked these in? -- Andy McKay Agmweb Consulting http://www.agmweb.ca
-- Myroslav Opyr zope.net.ua <http://zope.net.ua/> ° Ukrainian Zope Hosting e-mail: myroslav@zope.net.ua <mailto:myroslav@zope.net.ua>
Yes I agree, I think it would be better if the apis were getRecordForUid, getIndexForUid since the uids can be something other than paths.Thanks for the input on that. -Casey ----- Original Message ----- From: "Myroslav Opyr" <myroslav@zope.net.ua> To: "Casey Duncan" <casey@zope.com> Cc: "Andy McKay" <andy@agmweb.ca>; <zope-dev@zope.org> Sent: Sunday, August 18, 2002 6:55 PM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path Hi, Sorry, maybe I'm too late ;) but I have a note here. uid in Cataloc can be path but it can be any arbitrary unigue identifier. In system I'm working with it is ID generated in the time of object creation and it persists and it is uid in Catalog. Nothing wrong with the methods themselves but about their name. As far as I understood they are getRecordFromPath, getMetadataFromPath, getIndexFromPath. Generally it will be path a uid of Catalolog, but naming in other cases will be confusing. Maybe it is just a question of documentation? Regards, m. Casey Duncan wrote:
Your code looks fine, I think it meshes better with the underlying catalog code too. I don't have a problem with this getting checked in, just make sure you update IZCatalog.py, help/Catalog.py and add unit tests (that pass ;^).
-Casey
----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: "Casey Duncan" <casey@zope.com>; <zope-dev@zope.org> Sent: Sunday, August 18, 2002 2:54 AM Subject: Re: [Zope-dev] PathIndex doesn't index last part of path
If you call getObject, that actually does traversal anyway.
Right, in that situation it would be pointless...
You rock, Casey, thanks. I was thinking more about adding:
def getMetadataFromPath(self, path): """ get metadata for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getMetadataForRID(rid)
def getIndexFromPath(self, path): """ get index for an object using its path """ rid = self._catalog.uids[path] return self._catalog.getIndexDataForRID(rid)
Since this uses the same terminology and returns the same data as getIndexDataForRID and getMetadataDataForRID. Is there any reason why I couldn't checked these in? -- Andy McKay Agmweb Consulting http://www.agmweb.ca
-- Myroslav Opyr zope.net.ua <http://zope.net.ua/> ° Ukrainian Zope Hosting e-mail: myroslav@zope.net.ua <mailto:myroslav@zope.net.ua>
Casey Duncan writes:
A PathIndex is designed to make it more efficient to aggregate objects at various levels of containment. Their primary use case AFAIK is to allow to to limit queries to particular places within a hierarchy. The idea is to eliminate recursive searching of leaf level folders when you want all objects under a higher level and its child levels.
Also, by not indexing the nodes themselves, the index is an order of magnitude smaller and searches are therefore faster and it takes less room and is faster to update. This property need to be well documented.
At least, it is not intuitive. Dieter
A PathIndex is designed to make it more efficient to aggregate objects at various levels of containment. Their primary use case AFAIK is to allow to to limit queries to particular places within a hierarchy. The idea is to eliminate recursive searching of leaf level folders when you want all objects under a higher level and its child levels. Also, by not indexing the nodes themselves, the index is an order of magnitude smaller and searches are therefore faster and it takes less room and is faster to update. In fact there is no need to index the entire path of an object in the catalog. Even with no Indexes defined, ZCatalog already does this for you. The uid of every entry in the catalog is the full path to the object (as a string). Unfortunately, ZCatalog does not expose this to the surface but you can write a trivial external method to do it. And I might entertain adding a ZCatalog API to do so if I had a good use case. Right now you can only access entries by RID. Now that begs the question, If you already know the path to the object you are looking for, why are you using the Catalog in the first place? I highly doubt doing what you describe below is faster than just directly accessing the object. In fact I'd be willing to be its slower, especially since you are searching two indexes to get it. Unless of course these are dynamically generated objects of some kind (no stored in Zope). As for making RIDs more permanent, that would basically require a rewrite of the Catalog, and make certain operations much more expensive. As it stands, your application should only assume that RIDs are valid within a single transaction. You should use the path to uniquely identify objects, or some application defined uid that gets cataloged otherwise. -Casey ----- Original Message ----- From: "Andy McKay" <andy@agmweb.ca> To: <zope-dev@zope.org> Sent: Saturday, August 17, 2002 6:22 PM Subject: [Zope-dev] PathIndex doesn't index last part of path
This is mostly a question for AJ, but any input would be great. This bug bit me today and is documented here: http://collector.zope.org/Zope/449/ISSUE_TRANSCRIPT/view
I dont understand the brief argument against this one, it would make sense to me to able to pull an object out of the catalog based on its path. For example if I want /foo/bar/blammo, currently this means there is only one way of pulling the an object of the catalog given this path. Thats to send (path='/foo/bar', id='blammo'), rather than (path='/foo/bar/blammo'). Why wouldnt we want it this way?
One thing I have done is store a whole bunch of references to objects as selected by the user. These are essentially random objects and the quickest way is to pull them back out of the catalog. Of course I cant do more than one object per query (unless Im missing some other way) Id love to do (path=['/foo/bar/blammo', '/foo/bar/blammoz']) and get these 2 objects... I think that would be neat.
It would seem data_record_id_ is not guaranteed to permanent after a reindex_object (which CatalogAwareness uses), since this uncatalog and then recatalogs the object. If this did work it would be cool and I could undo all the changes to my app back again.
- The patch is already there, so Im curious why do we have what seems to be a more limited design? - Would a halfway option such as path_match='final' be a choice that wont break any code but would confuse everyone and not make into the documentation? - Is it just a matter of fixing reindex_object as was suggested on #zope so that data_record_id_ is more permanent?
Cheers -- Andy McKay Agmweb Consulting http://www.agmweb.ca
_______________________________________________ Zope-Dev maillist - Zope-Dev@zope.org http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
participants (4)
-
Andy McKay -
Casey Duncan -
Dieter Maurer -
Myroslav Opyr