[Zope] Prevent recursive and multiple URLs in Zope
Urs van Binsbergen
van.binsbergen@taktik.ch
Fri, 9 Aug 2002 20:20:04 +0200
Hi Zope
I am quite new to Zope and it is fun ;-)
There are however 2 things about it I do not like very much:
1) URL trailing slash handling:
http://example.com/some_doc
http://example.com/some_doc/
are both valid URLs to access the method or document some_doc in the
given root folder. In file-based publishing (like with apache) the second
URL would be invalid, because some_doc is not a folder.
http://example.com/some_folder
http://example.com/some_folder/
are both valid URLs to access the folder some_folder in the root folder.
Apache would allow the first URL, but would redirect to the second,
because some_folder is not a document, it is a folder.
2) recursive acquisition:
http://example.com/some_folder/some_folder/some_folder/some_folder/
is a valid URL to access the folder some_folder in the root folder.
---
WHY do I dislike these two things?
a) Philosophically: As the name "UNIQUE resource locator" already says:
it is generally not good to have the same content available via different
locators.
b) Technically: Working with relative links becomes unreliable and
dangerous. Problem #1 causes a relative URL to sometimes work and
sometimes not work, depending on whether the visitor accesses "foo/bar/"
or "foo/bar". Problem #2 makes relative links to be the door to infinite
recursion. A simple link like "<a href=3D"foo/">clickme</a>" will be the
trap, where tumb spiders will loose themselves in a infinite loop (this
was discussed shortly on this list under the subject "htdig indexing
problem".
---
Experiences?
Since there are lots of Zope sites out there and I did not find big
discussion on this matter until yet, am I maybe putting too much weight
on it?
---
Workarounds
I still hope to find a relatively simple solution to change that
behaviour. I did however only find some workarounds until now:
- avoid relative URLs
- work with absolute_url(), URL0, URL1 etc. instead
- work with <base href=3D...>
If my editors where all technical guys, this would be a solution (but
there is still Murphy's law...). But as I know my editors, they just type
something in as the link and test whether it works - and because it DOES
work, they do not notice that they just opened the door to infinite
recursion ;-).
Other workarounds I was told:
- (for problem 2): put an access-restricted subfolder with the same name
into any folder
- (for problem 2): disallow access to any some_folder/some_folder
combinations in a robots.txt
But these seems very tiresome and can only be automated with lots of work
(or has somebody tried this?).
--
Solution?
Probably someone who knows the Zope-interna well would be easily able to
create a plugin (product) which defines the following rules:
- if the request-URL has a trailing slash, and the invoked object is not
a folder: reponse 404 (even if generic Zope would serve an object then)
- if the request-URL has no trailing slash, and the invoked object IS a
folder: redirect to URL + '/'
- if the acquisition path invoked by the request-URL contains multiple
times an identical object: reponse 404
Does this make sense?
I tried to do it using an Access Rule with SiteAccess2, but this doesn't
seem to lead to a sensible solution, because an Access Rule is invoked
when a folder is traversed FIRST, and in this moment it is not known
which type of object the URL will call at last. So there should be
something like an Access Rule to be called _at the very end_ of the
traversal/acquisition process.
I would be very thankful for any hint regarding this story, because it is
really something that makes me a bit uneasy when starting to use Zope as
my platform of choice for more complex and extended sites (which for all
other reasons I would do, of course ;-) ).
Kind regards,
Urs
-------------------------
Urs van Binsbergen
van.binsbergen@taktik.ch
bureau taktik GmbH
Zentralstrasse 76b
8003 Z=FCrich
Telefon 01 450 34 05
-------------------------