Jump to content

Recommended Posts

Posted

During a conversation with Support, the issue of bots arose.

 

My site has developed a chronic bad-bot, scraper problem. (Note: By "scraping", I mean wholesale copying, frequently with automated software such as WebCapture or FrontPage.) Many spammer-scammer sites tweak their search-engine listings by including masked copies of utterly irrelevant snippets of text collected from my site. And no few others have scraped my site for re-sale. :angry:

 

In my .htaccess file, I have banned many known scraper bots, so if one shows up admitting what it is (in the "user-agent" field), it is automatically provided with the "403" "forbidden" response, and gets no further. To catch stealth bots (the ones that hide or lie about who they are) and unknown bots, I have set up a "honey-pot" link that leads to a spam-bot trapping script; when a bad-bot follows the honey-pot link, the script is activated and the violating IP address is (temporarily) added to my .htaccess file. The bad-bot gets maybe a page or two, and then is shunted to the 403 page. :whip:

 

To protect "good" bots (that is to say, polite, well-behaved bots such as the googlebot) from landing in the honey-pot, my robots.txt file warns against following that link. :)

 

So I have a known problem, and have followed the standard advice with my .htaccess file, my robots.txt file, and my honey-pot script. ;)

 

I explained this to Support, and was told that I should come here, to the Forums, to learn how to prevent the bad bots from ever coming to my site in the first place. I had heard that this wasn't even possible, and I would be very interested to learn how this is done. :blink:

 

Thank you.

 

Eliz.

Posted

If you have any links to your site from other sites that a bad-bot hits then they will come to your site. It sounds like you have a very nice trap in place to catch them though.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...