Re: [Zope] Prevent recursive and multiple URLs in Zope

10 Aug 2002


      Urs van Binsbergen writes:
...
...
There are however 2 things about it I do not like very much:
1) URL trailing slash handling:
http://example.com/some_doc
http://example.com/some_doc/
are both valid URLs to access the method or document some_doc in the
given root folder. In file-based publishing (like with apache) the second
URL would be invalid, because some_doc is not a folder.
http://example.com/some_folder
http://example.com/some_folder/
are both valid URLs to access the folder some_folder in the root folder.
Apache would allow the first URL, but would redirect to the second,
because some_folder is not a document, it is a folder.
2) recursive acquisition:
http://example.com/some_folder/some_folder/some_folder/some_folder/
is a valid URL to access the folder some_folder in the root folder.
---
WHY do I dislike these two things?
a) Philosophically: As the name "UNIQUE resource locator" already says:
it is generally not good to have the same content available via different
locators.
Maybe, your philosophical argument is weakened when you learn
that URL stands for "*UNIVERSAL* resource locator".
Its a universal syntax (!) to locate a resource accessible throuch
     a wide variety of protocols.

It is quite common to have the same resource accessed through different
URLs: often the same resource can be accessed both via HTTP and FTP,
often the same (local) resource can be accessed with the "file", the "ftp" and
the "http" protocol, often the same resource can be accessed
via both "ftp" and "webdav" (wich is HTTP based).
...
b) Technically: Working with relative links becomes unreliable and
dangerous. Problem #1 causes a relative URL to sometimes work and
sometimes not work, depending on whether the visitor accesses "foo/bar/"
or "foo/bar".
Only, when you do strange strings. Usually, Zope sets the HTML base
tag, such that it does not matter whether the user uses "foo/bar/"
or "foo/bar".
...
Problem #2 makes relative links to be the door to infinite
recursion. A simple link like "<a href="foo/">clickme</a>" will be the
trap, where tumb spiders will loose themselves in a infinite loop (this
was discussed shortly on this list under the subject "htdig indexing
problem".
When you use relative links in the same way you are forced to do it
in a file system based publishing environment, there will be no
infinite recursion. Simply avoid relative links containing a "/"
not preceeded by "..". Use an absolute URL otherwise.
...
Experiences?
Since there are lots of Zope sites out there and I did not find big
discussion on this matter until yet, am I maybe putting too much weight
on  it?
I feel you do.
...
Workarounds
...
- work with <base href=...>
This is done at automatically unless your pages are strange..
...
...
Other workarounds I was told:
- (for problem 2): put an access-restricted subfolder with the same name
into any folder
- (for problem 2): disallow access to any some_folder/some_folder
combinations in a robots.txt
You may also learn about SiteAccess AccessRules (--> documentation
on Zope.org).
...
Solution?
...
- if the request-URL has a trailing slash, and the invoked object is not
a folder: reponse 404 (even if generic Zope would serve an object then)
While a file system folder is a very narrow concept, there are many
folder variants in Zope. In fact, most objects in Zope can act like
a folder (in the sense that they support a default presentation called
"index_html").
Forget about the trailing "/" problem. Give your pages an HTML "head"
element (as you should anyway) and do not include a "base" tag,
then Zope will put such a tag in when it modified the URL.
...
- if the acquisition path invoked by the request-URL contains multiple
times an identical object: reponse 404
--> SiteAccess AccessRule in your root folder.
...
Does this make sense?
Maybe for you. I would not go this way.
Dieter

Re: [Zope] Prevent recursive and multiple URLs in Zope

Dieter Maurer