I wonder if anyone has experienced this: a) Google displays links to Zope site with a space in them For example, the search result has the right Title (and click on it works), but underneath the page summary the URL (some, not all) may be displayed like this http://domain.com/dir/ subdir/index_html b) Spiders download non-existing directories For example, the site has two directories, site.com/dir1, site.com/dir2 The weird thing is that dir1 can be accessed in these (and more) ways: http://site.com/dir2/dir1 http://site.com/dir1/dir2/dir1 Unnecessary to mention, the spider keeps downloading all the time. The worst of all is, the directories really can be accessed that way. Why is this possible? I noticed some spiders (Altavista, for example) are more likely to do that, some less. I blocked some that I can, the others are too important to block. Thanks Zoper 2.5.x
Sean Lee wrote at 2003-12-3 22:03 +0800:
I wonder if anyone has experienced this:
a) Google displays links to Zope site with a space in them For example, the search result has the right Title (and click on it works), but underneath the page summary the URL (some, not all) may be displayed like this http://domain.com/dir/ subdir/index_html
I do not know this...
b) Spiders download non-existing directories For example, the site has two directories, site.com/dir1, site.com/dir2 The weird thing is that dir1 can be accessed in these (and more) ways: http://site.com/dir2/dir1 http://site.com/dir1/dir2/dir1 Unnecessary to mention, the spider keeps downloading all the time. The worst of all is, the directories really can be accessed that way. Why is this possible?
This is acquisition at work. It is caused by non-trivial relative URL references (relative URL references (those not starting with a protocol nor with a '/') which contain at least one "/"). Do not use non-trivial URL references (use absolute URLs instead or explicitely use sufficiently many "../" in your relavtive URLs). -- Dieter
Hello Sean, you ran into some acquisition problems. Do yourself a favour and kill all relative links from your pages. I use to create links programmatically as <a href="<dtml-var "somepage.absolute_url()">"> <dtml-var "somepage.title()"> </a> Could you post a link to a Google search showing the problem? Are you sure the underscore is missing? Often it can't be seen because the link is underlined. Ulrich -- World Wide Web Publisher, Ulrich Wisser, Vallatorpsv.158, S-18752 Täby http://www.publisher.de Tel: +46-8-53460905 Fax: +46-8-534 609 06
Hello Thanks for those who helped with the first (acquisition) problem.
Could you post a link to a Google search showing the problem? Are you sure the underscore is missing? Often it can't be seen because the link is underlined.
Sure - pls try http://www.google.com/search?&q=oracle+cluster+site%3A%2Ecom%2Etw The first result is a PDF file, the link to it is correct (no space). Under the page summary, the link is given with a space after 2nd slash. www.erexi.com.tw/solutions/ Oracle9i_RAC_Solution%20Brief.pdf (incidentally, there is a space in the link, rendered correctly as "%20") Thanks Sean
Hello Sean,
http://www.google.com/search?&q=oracle+cluster+site%3A%2Ecom%2Etw The first result is a PDF file, the link to it is correct (no space). Under the page summary, the link is given with a space after 2nd slash. www.erexi.com.tw/solutions/ Oracle9i_RAC_Solution%20Brief.pdf (incidentally, there is a space in the link, rendered correctly as "%20")
yes, on the page the link is displayed with a space, but the title links to the correct destination (without space). So it is merely a Google display problem, people still get to the site. :) Ulrich -- World Wide Web Publisher, Ulrich Wisser, Vallatorpsv.158, S-18752 Täby http://www.publisher.de Tel: +46-8-53460905 Fax: +46-8-534 609 06
participants (3)
-
Dieter Maurer -
Sean Lee -
Ulrich Wisser