[Zope] Inheritance and the Search Engine : A Tragic Tale of Romance

Paul Winkler pw_lists at slinkp.com
Tue Sep 2 15:06:23 EDT 2003


On Tue, Sep 02, 2003 at 11:30:53AM -0600, Edward Pollard wrote:
> Greetings Zopers,
> 
> I've got a quandary that I would like to solicit wisdom on.
> 
> We are running Zope 2.6 as part of our website. Our central search 
> engine that spans all our websites is running on ht-dig. Inheritance on 
> zope

what you're talking about is actually called Acquisition.

> is sending our search engine on infinite traversals of the Zope 
> tree. This is caused, as you can probably guess, by relative linking 
> (eg '../../blah.htm'), and one ill-formed url somewhere that gets the 
> recursion started.
> 
> This is not good.

indeed...

> Now, I'm trying to create an environment where people can port their 
> websites from the old non-templated world to the new Zope-templated 
> one. However this necessitates a tolerance for bad coding, at the very 
> least a tolerance so that one idiot with a (trivially) badly formatted 
> url does not cause our search engine database to become somewhat close 
> to useless as the same page is listed at 1000 distinct urls. Of course 
> that just utterly destroys word search heuristics.
> 
> It being my opinion that I'm not doing something completely batty and 
> unusual, I thought there might be someone out there with greater 
> understanding with which to enlighten me.

Generally, these "badly-formed" urls are only a problem if they are
used in templates (e.g. used on many pages, as opposed to content
which is a single page). The best solution really is
to hunt down relative urls in templates, and get rid of them.

URLs of the form "/foo/bar" are OK. URLs of the form "http://server/foo/bar"
are also OK, though obviously less portable. URLs of the form "foo/bar"
are very likey to cause problems. 

That's just the way things are here in the wacky world of zope 2.
You might be interested to know that this kind of implicit acquisition
havoc does not occur in zope 3, because it has proven to be problematic
in so many situations... but that probably doesn't help you much today.

On the off chance that you have some kind of workflow system for 
your templates, you could probably cook up a way to run all edits 
through a URL-checking script and at least warn the user if they've 
done something problematic, or maybe even attempt to automagically fix it.
If you're using stock zope types such as Folders, Page Templates, and
DTML, which don't provide workflow out of the box, you might even consider 
replacing them with subclasses that do this step, or maybe use a
monkeypatch to add the checking to the existing types.
Implementation is left as an exercise to the reader :-)

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's THE FUZZY MARAUDER!
(random hero from isometric.spaceninja.com)



More information about the Zope mailing list