Faliol Posted November 26, 2003 Share Posted November 26, 2003 Hello, How do you prevent search engines from crawling certain parts of your site? For example we are a web design company. If we set up a test server on a subdomain called like Acorn (the address would be acorn.******) how would you prevent this so outside people can't view the test site? If I recall you have to use a robot text file. Thanks Quote Link to comment Share on other sites More sharing options...
TCH-Rob Posted November 26, 2003 Share Posted November 26, 2003 You can do it via the robots.txt file or just have no links to that portion of your site. You can look here to see how to make the robots.txt file www.robotstxt.org/wc/robots.html Quote Link to comment Share on other sites More sharing options...
Lianna Posted November 26, 2003 Share Posted November 26, 2003 You can also Webprotect it through cPanel. I have one portion of my site webprotected and even have links to it, but those pages are no longer listed in search engines. Quote Link to comment Share on other sites More sharing options...
Deverill Posted November 27, 2003 Share Posted November 27, 2003 Remember though that the robots.txt file only works on well-behaved crawlers. If it's something you really don't want to get out then make sure you password protect it. Quote Link to comment Share on other sites More sharing options...
amansker Posted March 14, 2004 Share Posted March 14, 2004 The robotstxt.org link first says: when a Robot vists a Web site, say http://www.foobar.com/, it firsts checks for http://www.foobar.com/robots.txt. But here: http://www.robotstxt.org/wc/exclusion-user.html it says: If you rent space for your HTML files on the server of your Internet Service Provider, or another third party, you are usually not allowed to install or modify files in the top-level of the server's document space. This means that to use the Robots Exclusion Protocol, you have to liase with the server administrator, and get him/her add the rules to the "/robots.txt", using the Web Server Administrator's Guide to the Robots Exclusion Protocol. These seem to conflict. Can I just put the robots.txt in the root of my sites? Quote Link to comment Share on other sites More sharing options...
MikeJ Posted March 14, 2004 Share Posted March 14, 2004 The section on renting space doesn't apply here. You have full ability to edit/create all files in the root of your webspace, which is where all crawlers will look for robots.txt. Quote Link to comment Share on other sites More sharing options...
Wilexa Posted March 14, 2004 Share Posted March 14, 2004 Your second quote doesn't apply to the web space here at TCH. Just put your robots.txt file in your public_html directory and you'll be fine. ...dave Quote Link to comment Share on other sites More sharing options...
Wilexa Posted March 14, 2004 Share Posted March 14, 2004 Rats! Mike beat me to the "add reply" button! ...dave Quote Link to comment Share on other sites More sharing options...
MikeJ Posted March 14, 2004 Share Posted March 14, 2004 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.