Inheritance and the Search Engine : A Tragic Tale of Romance
Greetings Zopers, I've got a quandary that I would like to solicit wisdom on. We are running Zope 2.6 as part of our website. Our central search engine that spans all our websites is running on ht-dig. Inheritance on zope is sending our search engine on infinite traversals of the Zope tree. This is caused, as you can probably guess, by relative linking (eg '../../blah.htm'), and one ill-formed url somewhere that gets the recursion started. This is not good. Now, I'm trying to create an environment where people can port their websites from the old non-templated world to the new Zope-templated one. However this necessitates a tolerance for bad coding, at the very least a tolerance so that one idiot with a (trivially) badly formatted url does not cause our search engine database to become somewhat close to useless as the same page is listed at 1000 distinct urls. Of course that just utterly destroys word search heuristics. It being my opinion that I'm not doing something completely batty and unusual, I thought there might be someone out there with greater understanding with which to enlighten me. Thanks for your time, --- Edward J. Pollard University of Lethbridge Web Development
On Tue, Sep 02, 2003 at 11:30:53AM -0600, Edward Pollard wrote:
Greetings Zopers,
I've got a quandary that I would like to solicit wisdom on.
We are running Zope 2.6 as part of our website. Our central search engine that spans all our websites is running on ht-dig. Inheritance on zope
what you're talking about is actually called Acquisition.
is sending our search engine on infinite traversals of the Zope tree. This is caused, as you can probably guess, by relative linking (eg '../../blah.htm'), and one ill-formed url somewhere that gets the recursion started.
This is not good.
indeed...
Now, I'm trying to create an environment where people can port their websites from the old non-templated world to the new Zope-templated one. However this necessitates a tolerance for bad coding, at the very least a tolerance so that one idiot with a (trivially) badly formatted url does not cause our search engine database to become somewhat close to useless as the same page is listed at 1000 distinct urls. Of course that just utterly destroys word search heuristics.
It being my opinion that I'm not doing something completely batty and unusual, I thought there might be someone out there with greater understanding with which to enlighten me.
Generally, these "badly-formed" urls are only a problem if they are used in templates (e.g. used on many pages, as opposed to content which is a single page). The best solution really is to hunt down relative urls in templates, and get rid of them. URLs of the form "/foo/bar" are OK. URLs of the form "http://server/foo/bar" are also OK, though obviously less portable. URLs of the form "foo/bar" are very likey to cause problems. That's just the way things are here in the wacky world of zope 2. You might be interested to know that this kind of implicit acquisition havoc does not occur in zope 3, because it has proven to be problematic in so many situations... but that probably doesn't help you much today. On the off chance that you have some kind of workflow system for your templates, you could probably cook up a way to run all edits through a URL-checking script and at least warn the user if they've done something problematic, or maybe even attempt to automagically fix it. If you're using stock zope types such as Folders, Page Templates, and DTML, which don't provide workflow out of the box, you might even consider replacing them with subclasses that do this step, or maybe use a monkeypatch to add the checking to the existing types. Implementation is left as an exercise to the reader :-) -- Paul Winkler http://www.slinkp.com Look! Up in the sky! It's THE FUZZY MARAUDER! (random hero from isometric.spaceninja.com)
On Tuesday, September 2, 2003, at 12:06 PM, Paul Winkler wrote:
We are running Zope 2.6 as part of our website. Our central search engine that spans all our websites is running on ht-dig. Inheritance on zope
what you're talking about is actually called Acquisition.
I knew that. No really....;-)
On the off chance that you have some kind of workflow system for your templates, you could probably cook up a way to run all edits through a URL-checking script and at least warn the user if they've done something problematic, or maybe even attempt to automagically fix it.
We're using Dreamweaver integration at the client to make a non-workflow workflow. We're very happy with it. Dreamweaver writes fairly clean code. However, its the "porting" of the old pages - involving copying and pasting - that has caused this problem for us on the two occasions it's occurred. Considering how castrated it makes the search engine, two times is more than enough. I've got some really silly ideas about incorporating some sort of script into the templates used at the document level to do some sort of checking against REQUEST.URL and here.absolute_url, but that looks like it would fark up the places where we *are* using acquisition intentionally. (eg. we only have one index_html in the department tree. Each directory has a property of the department code and acquires the index from the root). Thoughts on such a radical idea?
Edward Pollard wrote at 2003-9-2 11:30 -0600:
.... Inheritance on zope is sending our search engine on infinite traversals of the Zope tree. This is caused, as you can probably guess, by relative linking (eg '../../blah.htm'), and one ill-formed url somewhere that gets the recursion started.
It is very difficult to get a completely safe solution. Some approaches could be: You put a (SiteAccess) AccessRule in your Root Folder. It examines the URL and rejects URLs which do not look nice. "Not nice" could be: * it is two long * it contains a recurring pattern Of course, when I know your "not nice" definition, I could build a site for which your "not nice" would kill a valid URL. But you know your site and can adapt the "not nice" condition accordingly... Dieter
participants (3)
-
Dieter Maurer -
Edward Pollard -
Paul Winkler