[Zope] Prevent recursive and multiple URLs in Zope

Fri, 9 Aug 2002 20:20:04 +0200

Hi Zope

I am quite new to Zope and it is fun ;-)

There are however 2 things about it I do not like very much:

1) URL trailing slash handling:

http://example.com/some_doc
http://example.com/some_doc/
are both valid URLs to access the method or document some_doc in the 
given root folder. In file-based publishing (like with apache) the second 
URL would be invalid, because some_doc is not a folder.

http://example.com/some_folder
http://example.com/some_folder/
are both valid URLs to access the folder some_folder in the root folder. 
Apache would allow the first URL, but would redirect to the second, 
because some_folder is not a document, it is a folder.

2) recursive acquisition:

http://example.com/some_folder/some_folder/some_folder/some_folder/
is a valid URL to access the folder some_folder in the root folder.

---

WHY do I dislike these two things?

a) Philosophically: As the name "UNIQUE resource locator" already says: 
it is generally not good to have the same content available via different 
locators.
b) Technically: Working with relative links becomes unreliable and 
dangerous. Problem #1 causes a relative URL to sometimes work and 
sometimes not work, depending on whether the visitor accesses "foo/bar/" 
or "foo/bar". Problem #2 makes relative links to be the door to infinite 
recursion. A simple link like "<a href=3D"foo/">clickme</a>" will be the 
trap, where tumb spiders will loose themselves in a infinite loop (this 
was discussed shortly on this list under the subject "htdig indexing 
problem".  

---

Experiences?

Since there are lots of Zope sites out there and I did not find big 
discussion on this matter until yet, am I maybe putting too much weight 
on  it? 

---

Workarounds

I still hope to find a relatively simple solution to change that 
behaviour. I did however only find some workarounds until now:

- avoid relative URLs
- work with absolute_url(), URL0, URL1 etc. instead
- work with <base href=3D...>

If my editors where all technical guys, this would be a solution (but 
there is still Murphy's law...). But as I know my editors, they just type 
something in as the link and test whether it works - and because it DOES 
work, they do not notice that they just opened the door to infinite 
recursion ;-).

Other workarounds I was told:
- (for problem 2): put an access-restricted subfolder with the same name 
into any folder
- (for problem 2): disallow access to any some_folder/some_folder 
combinations in a robots.txt

But these seems very tiresome and can only be automated with lots of work 
(or has somebody tried this?).

--

Solution?

Probably someone who knows the Zope-interna well would be easily able to 
create a plugin (product) which defines the following rules:

- if the request-URL has a trailing slash, and the invoked object is not 
a folder: reponse 404 (even if generic Zope would serve an object then)
- if the request-URL has no trailing slash, and the invoked object IS a 
folder: redirect to URL + '/'
- if the acquisition path invoked by the request-URL contains multiple 
times an identical object: reponse 404

Does this make sense?

I tried to do it using an Access Rule with SiteAccess2, but this doesn't 
seem to lead to a sensible solution, because an Access Rule is invoked 
when a folder is traversed FIRST, and in this moment it is not known 
which type of object the URL will call at last. So there should be 
something like an Access Rule to be called _at the very end_ of the 
traversal/acquisition process.

I would be very thankful for any hint regarding this story, because it is 
really something that makes me a bit uneasy when starting to use Zope as 
my platform of choice for more complex and extended sites (which for all 
other reasons I would do, of course ;-) ).

Kind regards, 
Urs

-------------------------
Urs van Binsbergen
van.binsbergen@taktik.ch

bureau taktik GmbH
Zentralstrasse 76b
8003 Z=FCrich
Telefon 01 450 34 05
-------------------------