Jump to content

Recommended Posts

Posted

Recently asked:

 

I noticed one of your meta tags that was new to me.  It said something about Robots...can you tell me more about it's function?

 

><meta name="robots" content="ALL,FOLLOW">

The robot tag is specifically directed at (some) spiders. Content values include:

 

ALL - index all pages

INDEX - index the specific page but do not follow the links on the page

FOLLOW - follow and index linked pages

NOINDEX - continue onto linked pages but do not index current page

NOFOLLOW - do not follow the links on the page

Posted

Dsdemmin, what is your feeling on having a robots.txt file? (I think I got that right.)

 

Second question, does your post mean that you suggest having this Meta tag on pages?

Posted

Dsdemmin, what is your feeling on having a robots.txt file?

 

Does not hurt, the below allows all robots to visit all files

>User-agent: *
Disallow:

This is primarily used to disallow certain files/directories/etc. from a visit though. This would ban all robots from the cgi-bin directory

>User-agent: *
Disallow: /cg-bin/

Remember: This must be a txt file, it needs to be in your root directory and named robots.txt (i.e. case sensative).

 

 

Second question, does your post mean that you suggest having this Meta tag on pages?

 

Does not hurt. :D

Posted

If you disallow a robot to visit a certain directory, how is that different from cloaking a folder?

 

And if it's not different, I thought cloaking was frowned at by the spiders?

 

Thanks!

Posted

Disallowing a spider from visiting is very different from cloaking.

 

Cloaking is serving different content to spiders than the content that you show to human visitors. The idea is that you can design the site that humans see, and then show a different, optimized version to spiders so your rank is higher than it normally would be.

 

Search engines don't like being fooled. From what I have read, most tricks should be avoided. I say 'most' because some would argue over what is considered a trick.

 

I guess if you are trying to 'fool' the search engines, then you have to be aware that there is a good chance that if you are caught, then your site could be banned.

Posted

natimage:

 

Cloaking vs. disallow (or noindex) is indeed very different.... as Jack explained.

 

Actually, the search engines like disallow or noindex in the sense that it helps them avoid pages, which do not need to be spidered.

 

Why would anyone every want a page not spidered? Well many reasons... private images, private content. I never have spiders index form pages, not necessary.

 

In addition, I can have a directory full of scripts with html files for documentation purposes... I do not want those indexed.

Posted

Thanks for the clarification. I definately don't have and won't have any aspirations to "fool" the search engines. That's why I was asking in the first place...and just because I didn't understand the difference.

 

Just a little uneducated newbie here trying to learn her way!

 

Thanks again for the clarification. :)

Posted

I just loaded my robots.txt file. The code is below, but I wanted to ask if this is the correct way to disallow the robots from multiple directories???

 

>User-agent: *
Disallow: /cg-bin/
Disallow: /Images/
Disallow: /Misc/
Disallow: /MyJunk/

 

And I also just put the robots meta tag in place. I noticed in my raw files that googlebot had somehow looked for that on my site. Does it always look for that file?

 

Thanks,

Tracy

Posted

You missed the ' i ' in cgi-bin. Otherwise that looks correct.

 

I believe Google does look for and respect the robots tag as well as robots.txt file. Not all do.

 

Evil little buggers that they are........

 

ty

Posted

I confirm that it looks good :blink:

 

Googlebot checks a robots.txt first when entering a site and then will check for the robot meta tag on each page.... why?

 

The big reason in to conserve resources. Right now, the big issue with Google is resources (time). Therefore, if they do not need to spider one page (directory, or site) the better.

 

Thus, what they are looking for are disallow and noindex. Meaning both of these are used to avoid spidering, they will ignore 'cute' things like spider after 10 days, etc.

  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...