stapel Posted February 4, 2006 Posted February 4, 2006 During a conversation with Support, the issue of bots arose. My site has developed a chronic bad-bot, scraper problem. (Note: By "scraping", I mean wholesale copying, frequently with automated software such as WebCapture or FrontPage.) Many spammer-scammer sites tweak their search-engine listings by including masked copies of utterly irrelevant snippets of text collected from my site. And no few others have scraped my site for re-sale. In my .htaccess file, I have banned many known scraper bots, so if one shows up admitting what it is (in the "user-agent" field), it is automatically provided with the "403" "forbidden" response, and gets no further. To catch stealth bots (the ones that hide or lie about who they are) and unknown bots, I have set up a "honey-pot" link that leads to a spam-bot trapping script; when a bad-bot follows the honey-pot link, the script is activated and the violating IP address is (temporarily) added to my .htaccess file. The bad-bot gets maybe a page or two, and then is shunted to the 403 page. To protect "good" bots (that is to say, polite, well-behaved bots such as the googlebot) from landing in the honey-pot, my robots.txt file warns against following that link. So I have a known problem, and have followed the standard advice with my .htaccess file, my robots.txt file, and my honey-pot script. I explained this to Support, and was told that I should come here, to the Forums, to learn how to prevent the bad bots from ever coming to my site in the first place. I had heard that this wasn't even possible, and I would be very interested to learn how this is done. Thank you. Eliz. Quote
Deverill Posted February 4, 2006 Posted February 4, 2006 If you have any links to your site from other sites that a bad-bot hits then they will come to your site. It sounds like you have a very nice trap in place to catch them though. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.