During a conversation with Support, the issue of bots arose.
My site has developed a chronic bad-bot, scraper problem. (Note: By "scraping", I mean wholesale copying, frequently with automated software such as WebCapture or FrontPage.) Many spammer-scammer sites tweak their search-engine listings by including masked copies of utterly irrelevant snippets of text collected from my site. And no few others have scraped my site for re-sale.
In my .htaccess file, I have banned many known scraper bots, so if one shows up admitting what it is (in the "user-agent" field), it is automatically provided with the "403" "forbidden" response, and gets no further. To catch stealth bots (the ones that hide or lie about who they are) and unknown bots, I have set up a "honey-pot" link that leads to a spam-bot trapping script; when a bad-bot follows the honey-pot link, the script is activated and the violating IP address is (temporarily) added to my .htaccess file. The bad-bot gets maybe a page or two, and then is shunted to the 403 page.
To protect "good" bots (that is to say, polite, well-behaved bots such as the googlebot) from landing in the honey-pot, my robots.txt file warns against following that link.
So I have a known problem, and have followed the standard advice with my .htaccess file, my robots.txt file, and my honey-pot script.
I explained this to Support, and was told that I should come here, to the Forums, to learn how to prevent the bad bots from ever coming to my site in the first place. I had heard that this wasn't even possible, and I would be very interested to learn how this is done.
Thank you.
Eliz.