Hi We are running a site for a local government providing local information. Over the past few months, the site has been spidered daily and the information appears to be used for unwanted marketing campaigns. We have been asked if we could somehow block this access. We tried blocking the originating IP on the server (using route to blackhole), but the spider is rotating IPs within a day of us putting in a block. Is there some mechanism in Zope or can be added to a Zope instance that would limit the number of requests that are served per IP address within a certain period of time? We are aware that this would also affect Google and other search engine spiders, but as they usually come from a known IP range, it would be nice if we could exempt these from the "throttling". Any ideas or suggestions? DR
--On 21. August 2006 13:17:03 +0100 David <davidr@talamh.org.uk> wrote:
Hi
We are running a site for a local government providing local information. Over the past few months, the site has been spidered daily and the information appears to be used for unwanted marketing campaigns.
We have been asked if we could somehow block this access. We tried blocking the originating IP on the server (using route to blackhole), but the spider is rotating IPs within a day of us putting in a block.
Is there some mechanism in Zope or can be added to a Zope instance that would limit the number of requests that are served per IP address within a certain period of time? We are aware that this would also affect Google and other search engine spiders, but as they usually come from a known IP range, it would be nice if we could exempt these from the "throttling".
You should solve this inside your front-end Apache (hope you have one!). You might check mod_throttle for Apache and check your robots.txt file. -aj
David wrote at 2006-8-21 13:17 +0100:
... We have been asked if we could somehow block this access. We tried blocking the originating IP on the server (using route to blackhole), but the spider is rotating IPs within a day of us putting in a block.
Is there some mechanism in Zope or can be added to a Zope instance that would limit the number of requests that are served per IP address within a certain period of time? We are aware that this would also affect Google and other search engine spiders, but as they usually come from a known IP range, it would be nice if we could exempt these from the "throttling".
We have extended "VHM" (Virtual Host Monster) to give it additional functionality (such as recognizing and handling spiders). We do not have your specific functionality, though... -- Dieter
David wrote:
Is there some mechanism in Zope or can be added to a Zope instance that would limit the number of requests that are served per IP address within a certain period of time? We are aware that this would also affect Google and other search engine spiders, but as they usually come from a known IP range, it would be nice if we could exempt these from the "throttling".
Sorry, only just saw this. Not in Zope, but you may look at newer versions of the fail2ban package... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
participants (4)
-
Andreas Jung -
Chris Withers -
David -
Dieter Maurer