[ZWeb] Zope.org currently unusable
Andrew Sawyers
andrew at zope.com
Thu Mar 10 09:27:55 EST 2005
I need to read up on the robots.txt spec. Excellent Mark, thanks.
Andrew
--
Zope Managed Hosting
Software Engineer
Zope Corporation
(540) 361-1700
> -----Original Message-----
> From: zope-web-bounces at zope.org [mailto:zope-web-bounces at zope.org] On
> Behalf Of Mark Pratt
> Sent: Thursday, March 10, 2005 6:16 AM
> To: Jens Vagelpohl
> Cc: zope-web at zope.org
> Subject: Re: [ZWeb] Zope.org currently unusable
>
> Hi,
>
> I recommend adding crawl delays for all but google to something like:
>
> User-agent: Slurp
> Crawl-delay: 120
>
> This is for the yahoo bot but should also be applied to msnbot.
>
> It's crazy how some of these bots love to hit your site at the same
> time. A 120 second delay should be more than enough time between
> hits even if they all come at the same time.
>
> Cheers,
>
> Mark
>
>
> On Mar 10, 2005, at 10:33 AM, Jens Vagelpohl wrote:
>
> >
> > On Mar 10, 2005, at 2:18, Andrew Sawyers wrote:
> >
> >> It's a little of both; there's a group of people working on this - we
> >> hope
> >> to have something real soon now :) as a fix. Jens, could do you have
> >> the
> >> time to check the zope.org robots.txt? A lot of the problems I've
> >> seen
> >> recently were due to several robots spidering zope.org at a time. I'm
> >> working on additional hardware and we should see more traction on the
> >> project sooner then later.
> >
> > I don't believe all that much in robots.txt. The nasty bots completely
> > ignore it, anyway. The only way to deal with them is to block them
> > with e.g. iptables.
> >
> > What's currently there looks odd:
> >
> > """
> > User-agent: wget
> > Disallow: /
> >
> > User-agent: Wget
> > Disallow: /
> >
> > # Ask Google to skip search queries and the like.
> > User-agent: Googlebot
> > Disallow: /*?
> > """
> >
> > Looking at the spec the case sensitivity of the User-agent value is
> > only "recommended", but you could shorten that into the following,
> > because multiple User-agent values are allowed per rule set:
> >
> > """
> > User-agent: wget
> > User-agent: Wget
> > Disallow: /
> > """
> >
> > Otherwise there really isn't much in there, and from seeing googlebots
> > myself often enough I have my doubts whether the line "Disallow: /*?"
> > works at all.
> >
> > jens
> >
> > _______________________________________________
> > Zope-web maillist - Zope-web at zope.org
> > http://mail.zope.org/mailman/listinfo/zope-web
> >
> >
>
> _______________________________________________
> Zope-web maillist - Zope-web at zope.org
> http://mail.zope.org/mailman/listinfo/zope-web
More information about the Zope-web
mailing list