Banning Bots And Spiders


I do not want any type of bot or spider to have access to my site; I do not want to be listed on any search engine, anywhere, ever. To accomplish this, I'm using both the robots.txt file denying all, and the "noindex, nofollow" in the header of my main page.


On the stats pages it lists the crawlers, bots and spiders that attempt to index my site followed by some numbers. It looks like this:


Unknown robot (identified by 'crawl') 85+64

Unknown robot (identified by hit on 'robots.txt') 0+147

Inktomi Slurp 0+64

Unknown robot (identified by 'spider') 10+35

Googlebot (Google) 1+30

Jeeves 0+4

Unknown robot (identified by 'robot') 0+2



It says that "Numbers after + are successful hits on "robots.txt" files". What are the numbers before the + ? I'm guessing that one "85+64" means that 64 unknown bots pinged the robots.txt file, and the other 85 didn't even bother. Is that right?


What I'd like is for all unknown robots to be denied access to my site. Is there any way to do that?



Not all of them respect robots.txt unfortunately. The majors do but ...


you could totally ban them from your site using .htaccess - you would need to do that as they showed up, I suppose though.


Other than that - if you never link to it, they won't know it's there... =)


That said, I'm not 100% sure how to read that particular portion of awstats so I'll leave someone else to confirm what those +'s mean. A search on AWstats faq on google would probably get information as well. =)

Yeah... the major search engines are pretty polite about it. The spammers running the bots is what I'd like to ban - I guess there's just no way to tell whether it's a human or a bot until after the fact.


I'll do that google search - thanks!



