RE: [Zope] Preventing duplicates in ZCatalog
I've never understood why anyone would use CatalogAware given that CatalogPathAware works quite well. Is there even any reason to justify its existence in Zope at all other than backward-compatibility? Sean -----Original Message----- From: Wankyu Choi [mailto:wankyu@neoqst.com] Sent: Tuesday, April 22, 2003 1:25 PM To: 'Dieter Maurer' Cc: zope@zope.org Subject: RE: [Zope] Preventing duplicates in ZCatalog
When you see the same object catalogued under different catalog uids, you should either upgrade to Zope 2.6.1 or fix the code that does not use "getPhysicalPath". A likely candidate it >"Products.ZCatalog.CatalogAware". Replace this by "Products.ZCatalog.CatalogPathAware".
I already use Zope 2.6.1. Hm.. CMFCore's PortalContent imports Products.ZCatalog.CatalogAware, not CatalogPathAware. Probably, this is the source of my problem, then? ( CMF 1.3.1, the latest. ) May I ask why CMF imports CatalogAware instead of CatalogPathAware? CMF people should know better than that. I just want to know if there's any particular reason why, since I'm basing my own applications on CMF. Thanks in advance. --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net --------------------------------------------------------------- _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Hi Sean, I'm from a very different background, and thought if some people very close to the core (Zope or CMF, in this case) use something seemingly official, that's official: CMF in this case. Even with 1.3.1, CMF's PortalConent.py imports " from CMFCatalogAware import CMFCatalogAware ". Thought it was a good thing to do since it's from, well, people very close to the core^^. I read an article from the CMF people that they're about to announce CMF 1.4... I'm really confused on who works on what when it comes to Zope. Why would anyone working on CMF insist on using CMFCatalgAware if you think that's unreasonable? ( I tried the HEAD of PortalContent.py and it's still importing CMFCatalogAware. ) Regards, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net --------------------------------------------------------------- -----Original Message----- From: sean.upton@uniontrib.com [mailto:sean.upton@uniontrib.com] Sent: Wednesday, April 23, 2003 5:33 AM To: wankyu@neoqst.com; dieter@handshake.de Cc: zope@zope.org Subject: RE: [Zope] Preventing duplicates in ZCatalog I've never understood why anyone would use CatalogAware given that CatalogPathAware works quite well. Is there even any reason to justify its existence in Zope at all other than backward-compatibility? Sean -----Original Message----- From: Wankyu Choi [mailto:wankyu@neoqst.com] Sent: Tuesday, April 22, 2003 1:25 PM To: 'Dieter Maurer' Cc: zope@zope.org Subject: RE: [Zope] Preventing duplicates in ZCatalog
When you see the same object catalogued under different catalog uids, you should either upgrade to Zope 2.6.1 or fix the code that does not use "getPhysicalPath". A likely candidate it >"Products.ZCatalog.CatalogAware". Replace this by "Products.ZCatalog.CatalogPathAware".
I already use Zope 2.6.1. Hm.. CMFCore's PortalContent imports Products.ZCatalog.CatalogAware, not CatalogPathAware. Probably, this is the source of my problem, then? ( CMF 1.3.1, the latest. ) May I ask why CMF imports CatalogAware instead of CatalogPathAware? CMF people should know better than that. I just want to know if there's any particular reason why, since I'm basing my own applications on CMF. Thanks in advance. --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net --------------------------------------------------------------- _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On Wed, Apr 23, 2003 at 05:56:41AM +0900, Wankyu Choi wrote: (snip)
Even with 1.3.1, CMF's PortalConent.py imports " from CMFCatalogAware import CMFCatalogAware ". (snip) I read an article from the CMF people that they're about to announce CMF 1.4... I'm really confused on who works on what when it comes to Zope. Why would anyone working on CMF insist on using CMFCatalgAware if you think that's unreasonable? ( I tried the HEAD of PortalContent.py and it's still importing CMFCatalogAware. )
CMFCatalogAware != CatalogAware. -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's DICTATOR RAINCOAT! (random hero from isometric.spaceninja.com)
Hi, Further inverstigation found that CMFCatalogAware was just a start of the inheritance hierarchy. But, CatalogPathAware wasn't imported in any way during initializing CMF. And I found a couple of **very old** articles saying importing *CatalogAware* in CMF is deprecated... huh? Am I wrong in inheriting from PortalContent.py for example? Wakyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
I found that this problem only occurs with CMF instances + virtual host monster. Normal Zope folders cause no such problem. Only CMF and Plone instances cause this problem when virtual host monster maps a domain to their paths. It's not even a CatalogAwareness problem either since a simple call to an object's getPhysicalPath() via urls can demonstrate what's happening internally: - Add a CMF site named 'CMF'. - Add and set up a virtual host monster in such a way that a certain domain, www.example.com, for example, maps to the path of the CMF instance created above. - create a DTML method called 'test' A call to this DTML method's getPhysicalPath() with the URL "http://www.example.com/test/getPhysicalPath" returns: ('', 'CMF', 'test') The URL "http://www.example.com/CMF/test/getPhysicalPath" returns: ('', 'CMF', 'CMF', 'test') The URL "http://www.example.com/CMF/CMF/CMF/CMF/test/getPhysicalPath" returns: ('', 'CMF', 'CMF', 'CMF', 'CMF', 'CMF', 'test') You get the idea. Without VHM, the problem disappears. With VHM, one can create tons of duplicate entries in the portal_catalog as demonstrated above. I suspect the getPhysicalPath() method overriden in the CMF package doesn't behave well with VHM. I'm cc'ing this to the CMF maillinglist since it's more of a CMF problem :-) How can I fix it? (or is it fixed in the upcoming CMF 1.4 ?) TIA. Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
Wankyu Choi wrote:
I found that this problem only occurs with CMF instances + virtual host monster.
Normal Zope folders cause no such problem. Only CMF and Plone instances cause this problem when virtual host monster maps a domain to their paths.
It's not even a CatalogAwareness problem either since a simple call to an object's getPhysicalPath() via urls can demonstrate what's happening internally:
- Add a CMF site named 'CMF'.
- Add and set up a virtual host monster in such a way that a certain domain, www.example.com, for example, maps to the path of the CMF instance created above.
- create a DTML method called 'test'
A call to this DTML method's getPhysicalPath() with the URL "http://www.example.com/test/getPhysicalPath" returns: ('', 'CMF', 'test')
The URL "http://www.example.com/CMF/test/getPhysicalPath" returns: ('', 'CMF', 'CMF', 'test')
The URL "http://www.example.com/CMF/CMF/CMF/CMF/test/getPhysicalPath" returns: ('', 'CMF', 'CMF', 'CMF', 'CMF', 'CMF', 'test')
You get the idea. Without VHM, the problem disappears.
With VHM, one can create tons of duplicate entries in the portal_catalog as demonstrated above.
I suspect the getPhysicalPath() method overriden in the CMF package doesn't behave well with VHM.
I'm cc'ing this to the CMF maillinglist since it's more of a CMF problem :-)
How can I fix it? (or is it fixed in the upcoming CMF 1.4 ?)
Worse yet, I can reproduce this without a VHM, though, just to be sure, I had a VHM before on that server. I added a fresh CMF site, "CMF" and went directly (no apache in between) to http://myserver:8080/CMF/CMF/getPhysicalPath and got ('','CMF','CMF') Duh! But I can only reproduce this with the PortalSite object for now, I suspected SkinnableFolder would also exhibit this bug, but didn't. Further, I deleted any subobjects (all *_tool objects etc.) from the PortalSite object and the same happened. Ok, here's a workaround which should work: Add a normal folder (stock zope) named 'CMF' (or whatever your Portal Site object is called) as a subobject to the Portal Side object. This will prevent it from acquiring itself and causing this havoc. HTH, oliver
But I can only reproduce this with the PortalSite object for now, I suspected SkinnableFolder would also exhibit this bug, but didn't. Further, I deleted any subobjects (all *_tool objects etc.) from the PortalSite object and the same happened.
I guess so. It must be something in the skinning machinery. My message board application, NeoBoard, has the same skinning machinery as CMF's to give it a skinned look with or without CMF. And it exhibits the same symptoms with getPhysicalPath() calls. That's why I had to create a new method that returns the relative path of an article object. The built-in catalog of a NeoBoard instance saves uids like the following: /a_1 # first article /a_1/a_1 # first reply to a_1 /a_1/a_2 # seond reply to a_1 /a_2 # second article But visiting an article like the following still duplicates entries in the board's built-in catalog: /Board/a_1 -> /a_1 /Board/Board/a_1 -> /Board/a_1 /Board/Board/Board/a_1 -> /Board/Board/a_1 ... I think removing **all** instances of the container's id when catalogging article objects should work. Currently, only the first instance is being removed. I just didn't think acquisition would wreak this much havoc when misused :-(
Ok, here's a workaround which should work: Add a normal folder (stock zope) named 'CMF' (or whatever your Portal Site object is called) as a subobject to the Portal Side object. This will prevent it from acquiring itself and causing this havoc.
That works! Thanks for the tip. Best Regards, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
Wankyu Choi wrote:
Ok, here's a workaround which should work: Add a normal folder (stock zope) named 'CMF' (or whatever your Portal Site object is called) as a subobject to the Portal Side object. This will prevent it from acquiring itself and causing this havoc.
That works!
Thanks for the tip.
If this bug isn't known, could you add something to the collector, so that it gets fixed? cheers, oliver
If this bug isn't known, could you add something to the collector, so that it gets fixed?
I just did ;-) all the best, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
Wankyu Choi wrote:
[snip] But visiting an article like the following still duplicates entries in the board's built-in catalog:
/Board/a_1 -> /a_1 /Board/Board/a_1 -> /Board/a_1 /Board/Board/Board/a_1 -> /Board/Board/a_1 ...
Another remark, there really shouldn't be links which point insert _any_ unnecessary acquisition, like /Board/Board/Board , because it could lead to infinite recursion.
I think removing **all** instances of the container's id when catalogging article objects should work. Currently, only the first instance is being removed. I just didn't think acquisition would wreak this much havoc when misused :-(
You should see what happens if you have made the above mentioned mistake and a spider/crawler hits your side. Google's 16.000 machines vs. your server, guess who looses ;) - ok, it's not that bad, but shit can hit the van. I had recently some "expert" on a big pipe trying to bulk download a website we host and causing more traffic in one hour than we normally get in a whole day because of this recursion. I think it just stopped when the request URI got too long for his client or it crashed. Zope stood like a wall ;). Everyone look out for things like "Fetch API REQUEST" in your logs. cheers, oliver
/Board/a_1 -> /a_1 /Board/Board/a_1 -> /Board/a_1 /Board/Board/Board/a_1 -> /Board/Board/a_1 ...
Another remark, there really shouldn't be links which point insert _any_ unnecessary acquisition, like /Board/Board/Board , because it could lead to infinite recursion.
Yes, no one would want this crazy acquisition test being performed on his server. But people do. Some visitors think this is fun ;-) Well, Deep Throat was right. Trust no one. Plus, VHM sometimes redirects visitors to a mapped folder prepending the folder's id ( I don't know why, but it does happen from time to time ): www.example.com/CMF, for example, where the url should have been just "www.example.com". And that's where this madness starts. Another situation with VHM: you log in as manager; manage your CMF sites; while you're at it, try to add/edit some content; you put yourself into this acquisition blackhole again.
I think removing **all** instances of the container's id when catalogging article objects should work. Currently, only the first instance is being removed. I just didn't think acquisition would wreak this much havoc when misused :-(
You should see what happens if you have made the above mentioned mistake and a spider/crawler hits your side. Google's 16.000 machines vs. your server, guess who looses ;) - ok, it's not that bad, but shit can hit the van.
What I'm worried is not myself making mistakes with the urls in the code. What about visitors? Luckily, I run Squid before ZEO clients and can rewrite funny urls removing redundant path elements. Without this redirect_program script, I can't prevent users from having fun with this acquisition thing... or can I? cheers, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
Wankyu Choi wrote:
/Board/a_1 -> /a_1 /Board/Board/a_1 -> /Board/a_1 /Board/Board/Board/a_1 -> /Board/Board/a_1 ...
Another remark, there really shouldn't be links which point insert _any_ unnecessary acquisition, like /Board/Board/Board , because it could lead to infinite recursion.
Yes, no one would want this crazy acquisition test being performed on his server. But people do. Some visitors think this is fun ;-) Well, Deep Throat was right. Trust no one.
Plus, VHM sometimes redirects visitors to a mapped folder prepending the folder's id ( I don't know why, but it does happen from time to time ): www.example.com/CMF, for example, where the url should have been just "www.example.com". And that's where this madness starts.
I have never seen that. Are you sure your rewrite rules are right? Maybe a trailing slash too much or missing?
Another situation with VHM: you log in as manager; manage your CMF sites; while you're at it, try to add/edit some content; you put yourself into this acquisition blackhole again.
But this does only happen due to the bug you found, doesn't it? How else could this give a problem?
I think removing **all** instances of the container's id when catalogging article objects should work. Currently, only the first instance is being removed. I just didn't think acquisition would wreak this much havoc when misused :-(
You should see what happens if you have made the above mentioned mistake and a spider/crawler hits your side. Google's 16.000 machines vs. your server, guess who looses ;) - ok, it's not that bad, but shit can hit the van.
What I'm worried is not myself making mistakes with the urls in the code. What about visitors? Luckily, I run Squid before ZEO clients and can rewrite funny urls removing redundant path elements.
Without this redirect_program script, I can't prevent users from having fun with this acquisition thing... or can I?
Well, I got this idea in another thread, somewhere in your product you could compare URL0 (or URL1, or whatever, don't remember ATM) with self.absolute_url() and just return an redirect to self.absolute_url() if they don't match. cheers, oliver
Plus, VHM sometimes redirects visitors to a mapped folder prepending the folder's id ( I don't know why, but it does happen from time to time ): www.example.com/CMF, for example, where the url should have been just "www.example.com". And that's where this madness starts.
I have never seen that. Are you sure your rewrite rules are right? Maybe a trailing slash too much or missing?
I didn't even run Apache. I used VHM mappings directly: *.example.com/Example, for example. And it **DOES** happen with this simple setting. I don't know exactly when.
Another situation with VHM: you log in as manager; manage your CMF sites; while you're at it, try to add/edit some content; you put yourself into this acquisition blackhole again.
But this does only happen due to the bug you found, doesn't it? How else could this give a problem?
Yes, I meant exactly that ;-)
Without this redirect_program script, I can't prevent users from having fun with this acquisition thing... or can I?
Well, I got this idea in another thread, somewhere in your product you could compare URL0 (or URL1, or whatever, don't remember ATM) with self.absolute_url() and just return an redirect to self.absolute_url() if they don't match.
I did: security.declarePrivate( '_getObjectURL' ) def _getObjectURL( self, ob ): """ Return Object URL """ path = list( ob.getPhysicalPath() ) path = path[ path.index( self.getNeoPortalElementContainerCatalog().getId() )+1: ] # make paths acquisition-safe while ( self.getNeoPortalElementContainerCatalog().getId() in path ): path.remove( self.getNeoPortalElementContainerCatalog().getId() ) return '/'+ '/'.join( path ) Just removing every instance of the container's id in the path looked simpler :-) I'll try your idea. Thanks. All the best, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
| I suspect the getPhysicalPath() method overriden in the CMF package doesn't | behave well with VHM. Humm... where getPhysicalPath is being overriden? I cant find it. []'s -- Sidnei da Silva (dreamcatcher) <sidnei@x3ng.com.br> X3ng Web Technology <http://www.x3ng.com.br> GNU/Linux user 257852 Debian GNU/Linux 3.0 (Sid) 2.4.18 ppc You have junk mail.
Humm... where getPhysicalPath is being overriden? I cant find it.
I was wrong in that assumption ;-) cheers, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------
participants (5)
-
Oliver Bleutgen -
Paul Winkler -
sean.upton@uniontrib.com -
Sidnei da Silva -
Wankyu Choi