Jump to content

Recommended Posts

Posted

Hello,

 

How do you prevent search engines from crawling certain parts of your site? For example we are a web design company. If we set up a test server on a subdomain called like Acorn (the address would be acorn.******) how would you prevent this so outside people can't view the test site? If I recall you have to use a robot text file.

 

Thanks

Posted

You can do it via the robots.txt file or just have no links to that portion of your site. You can look here to see how to make the robots.txt file www.robotstxt.org/wc/robots.html

Posted

You can also Webprotect it through cPanel. I have one portion of my site webprotected and even have links to it, but those pages are no longer listed in search engines.

Posted

Remember though that the robots.txt file only works on well-behaved crawlers. If it's something you really don't want to get out then make sure you password protect it.

  • 3 months later...
Posted

The robotstxt.org link first says:

 

when a Robot vists a Web site, say http://www.foobar.com/, it firsts checks for http://www.foobar.com/robots.txt.

 

But here: http://www.robotstxt.org/wc/exclusion-user.html

it says:

 

If you rent space for your HTML files on the server of your Internet Service Provider, or another third party, you are usually not allowed to install or modify files in the top-level of the server's document space.

 

This means that to use the Robots Exclusion Protocol, you have to liase with the server administrator, and get him/her add the rules to the "/robots.txt", using the Web Server Administrator's Guide to the Robots Exclusion Protocol.

 

These seem to conflict. Can I just put the robots.txt in the root of my sites?

Posted

The section on renting space doesn't apply here.

 

You have full ability to edit/create all files in the root of your webspace, which is where all crawlers will look for robots.txt.

Posted

Your second quote doesn't apply to the web space here at TCH.

 

Just put your robots.txt file in your public_html directory and you'll be fine.

 

...dave

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...