Samrc Posted August 19, 2005 Posted August 19, 2005 I just ran across the BETA version of the Yahoo My Web. "Move beyond bookmarks - create your own personal, searchable web" It touts several things including one statement that really just TEES ME OFF! Save all the pages you like (exact copies, not just links!) Anyone know the name of the BOT this thing will use so I can add it to my nasty list in the robots.txt file? I don't like the idea that YAHOO will be CACHING my webpages ("exact" copies) for people. Yes I know that Google and Yahoo have caching set up for the search results but somehow this seems so...... WRONG. I did not give anyone permission to capture my webpages in total. It's one thing to link to my pages (so you always get the latest content). It's quite another to purposely capture and store them in bulk! Anyone else think this way or am I totally out of line? -Samantha Quote
thehemi Posted August 19, 2005 Posted August 19, 2005 Kinda weird when you consider most interesting content is dynamic in nature. Quote
TCH-Rob Posted August 19, 2005 Posted August 19, 2005 Something I have found. 1) When a user caches a copy it is just available to them. 2) When a user shares a folder with their saved stuff we provide a link to the live site not the cached copy. There is no republishing to other users going on. I have no clue if there will be a "special" bot they will be using for the site. Quote
Striver Posted August 19, 2005 Posted August 19, 2005 None of the major search engines cache any of my pages. I have this on every page: ><meta name="robots" content="noarchive"> Lee Quote
Samrc Posted August 19, 2005 Author Posted August 19, 2005 I like the noarchive metatag idea. Wonder if I could build it into my robots.txt file so I don't have to put it on all pages. Kinda weird when you consider most interesting content is dynamic in nature. I agree. -Samantha Quote
Deverill Posted August 19, 2005 Posted August 19, 2005 Samantha, It's not just the search engines, but many ISP's cache your pages as well. If 15 AOL users go to your site close to the same time, the odds are good that most will get cached pages, not fresh content. I have heard of some of these ISPs causing problems on particularly dynamic sites by doing this and having the cached version for days. Hopefully they are not doing it too much, but you can't guarantee everything's real-time. The no-archive is a great tool for search engines caching pages, but as I understand it from the Google site, if they can get current they do - the cached is only used if you click on "cache" or if they can't get to your site. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.