Jump to content

Recommended Posts

Posted

Hope I'm in the right place for this.

 

I was just looking over AwStats in the 'browsers' section and noticed one called TelePort Pro and apparently it is an offline browser. Well it looks like it downloaded my entire site - all 18,000+ pages of it!! 16.5% of my traffic and bandwidth. I do not like this idea at all, not one bit.

 

Is there anyway to keep it out?? :)

 

Thanks, any help on this will be greatly appreciated. I probably do know how to do this but I'm a bit too freaked to think about it clearly.

 

- Ty

Posted

If you look at Latest Visitors in AWStats you may see that one IP was responsible. If so, you could try adding an IP block on that address to keep him out. Send me a PM if you want me to look into it.

Posted

Teleport pro is nearly impossible to block because it litterally acts like a browser and immates a robot.

 

The best way to do this is to use an IP block if its occurs again

 

Jim

Posted

Thanks, I hadn't seen the php one before. I've never worked with either MySql or php but I'll try. I did block the IP but it's like closing the barn doors after the horse. B*****rds got all 18000 pages of my archives. :)

 

Ty

Posted

Thank you Jim for taking the extra time to look into this. I will certainly give it a try, I don't suppose it could hurt. :D

 

- Ty

Posted

I tried adding it to robots.txt

 

User-agent: teleport

Disallow: /

 

and then I downloaded the free version of teleport. Got right into my site. I tried it with full name and version number as seen in my raw logs and that didn't work either

 

User-agent: Teleport Pro/1.29.1590

Disallow: /

 

 

(freebie has different version# so I changed it).....SO....... I'm still looking for suggestions if anybody has any, please?

 

Thanks,

Ty

Posted

But with that nice safe shell to hide in, not a problem :)

 

I'm having a look at how to protect my site to, I'll let you know how I get on with my attempts if I ever understand it all :)

 

Andy

Posted

I downloaded Teleport Pro to have a play, and the more I play, the less I like it.

 

By default, it impersonates Microsoft Internet Explorer - you can set the option to identify itself as Teleport, but who will :)

 

You even have an option to ignore the robots exclusion standard !!!!!

 

I don't see a way that the htaccess can get round this without a little extra programming.

 

My first thoughts are to use PHP to recognise when anyone is hitting my pages faster than say 10 pages / minute. It should let all the friendly bots through, but deny anyone trying to suck your pages dry.

 

Anyone have any other suggestions?

 

Andy

Posted

I noticed that about the "disguises" as well. I still went ahead and put a list in my .htaccess file but you're right, there's got to be a more efficient way.

(snippet)

 

 

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Teleport [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Vacuum [NC]

RewriteRule .* - [F]

 

stuckintheshell

ty

Posted

For the benefit of others I'll pass on what I've learnt about reading the .htaccess file

 

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^Teleport [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Vacuum [NC]

RewriteRule .* - [F]

 

^ means directly at the beginning of the string - ie in the first case the agent identifyer (HTTP_USER_AGENT) starts with the letters tAkeOut

 

[NC] means No Case - ie it is not case sensitive

 

[OR] is a logical 'OR' ie, if the agent is takeout or teleport or vacuum then the rule applies

 

[F] means Forbidden - i.e. it returns a 403 forbidden code to the client.

 

boxturt, if we tickle you under the chin do you come out of your shell? it used to work for my tortoise :)

 

Andy

Posted

I think Andy's suggestion would work, but would require either using session or referencing a DB with each hit to see how many hits from a specific user in a given amount of time. Some performance hit there, but it would allow you to but someone off via PHP, just serve them up a big fat nothing after the first ten pages they request in less than 15 seconds. If the problem "browser" doesn't honor cookies I guess you'd have to use a DB as a work-around.

 

I'll post sample code if anyone's interested.

Posted
RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]RewriteCond %{HTTP_USER_AGENT} ^Teleport [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Vacuum [NC]

RewriteRule .* - [F]

 

I put this code in my .htaccess, right beneath my options -indexes command. Is that where it should go?

 

Thanks,

Tracy

Posted (edited)

Hi,

 

Here is a rather long list of ones that I now block. You will note google image bot is there and a few other bots, delete the lines you dont want! Please also note it does block some mozilla copies so you may wish to delete these too.

 

>RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackDoorBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Black.Hole [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlowFish [OR]
RewriteCond %{HTTP_USER_AGENT} ^BotALot [OR]
RewriteCond %{HTTP_USER_AGENT} ^BuiltBotTough [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bullseye [OR]
RewriteCond %{HTTP_USER_AGENT} ^BunnySlippers [OR]
RewriteCond %{HTTP_USER_AGENT} ^Cegbfeieh [OR]
RewriteCond %{HTTP_USER_AGENT} ^CheeseBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^CopyRightCheck [OR]
RewriteCond %{HTTP_USER_AGENT} ^cosmos [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^EroCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Foobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^Harvest [OR]
RewriteCond %{HTTP_USER_AGENT} ^hloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^httplib [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^humanlinks [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InfoNaviRobot [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JennyBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Kenjin.Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Keyword.Density [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LexiBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^libWeb/clsHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkextractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkScan/8.1a.Unix [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mata.Hari [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIIxpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister.PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^moget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.Mozilla/2.01 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline.Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProPowerBot/2.14 [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ProWebWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^QueryN.Metasearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RepoMonkey [OR]
RewriteCond %{HTTP_USER_AGENT} ^RMA [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpankBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^spanner [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^suzuran [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz/1.4 [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} ^The.Intraformant [OR]
RewriteCond %{HTTP_USER_AGENT} ^TheNomad [OR]
RewriteCond %{HTTP_USER_AGENT} ^TightTwatBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Titan [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^toCrawl/UrlDispatcher [OR]
RewriteCond %{HTTP_USER_AGENT} ^True_Robot [OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot/1.5 [OR]
RewriteCond %{HTTP_USER_AGENT} ^URLy.Warning [OR]
RewriteCond %{HTTP_USER_AGENT} ^VCI [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEnhancer [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.Image.Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebmasterWorldForumBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website.Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster.Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Collector-E [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu's [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus

 

Jim

Edited by Jimuni
Posted

Thank Jim. Many I do have (I've got a pretty long list) but there are a few here I don't recognize and will sure check out.

 

If I can just figure out a way to recognize the incognito leeches after a certain amount of downloads and boot them out I'd be all set!

 

Actually, there shouldn't be any "downloads" at all - should there? (search engine cache?)

 

Thanks

-Ty

  • 2 weeks later...
Posted

I've thrown together a bit of php code which checks for to many hits from the same IP address, and then blocks it for a given amount of time. It seems to work, but would welcome someone trying to grab http://www.mycoted.com/test/norobot/index.php

 

If it works I'll happily tidy up the code a bit and publish it here, then the experts, rather than beginners like me, can sort it out properly :lol:

 

Would anyone like to guess what sensible figures for times are. At the moment I will block (for 30s) anyont hitting the test site more than 6 times in 2 seconds. I want to make sure I don't stop google or any other friendly robot :lol:

 

Andy

Posted

Doesn't sound good :lol:

 

Have you had a look at the content in the files - hopefully I may be saved, and only the first 7 or 8 actually have something in - other than an error message.

 

Andy

Posted

Yes, I have looked - after reading your post (wasn't sure I was supposed to).

 

The first 7 pages are all templates pages.

 

Pages 8 - 30 are Access Denied content with my IP# in them but it did still deliver all 33 items.

 

Anything specific I should look for?

Posted

Thanks,

 

At least you didn't manage to steal the content :lol:

 

I had set it so that it just gives them a dummy file with their IP address - having worked out the code, I could fairly easily get it to edit the .htaccess file and deny their IP. My problem then was deciding if I bother to allow them access again at some later date...

 

My overall view was that if they end up with a hard disk full of rubbish after trying to steal my pages then I don't mind :lol:

 

I've had 3 folk take a full copy of my web site in the last 2 weeks, so was starting to get frustrated....

 

Andy

Posted

Seemed to work then! So what you're saying is I would have kept getting (perpetually) the same junk files over and over?

 

Wish I'd had that yesterday. Got nailed by Acrobat last night, I didn't even know they (Adobe) had one of those evil things but they do.

 

:lol:

 

-Ty

Posted

That's the plan - just the same junk file forever :lol:

 

I'll do a couple of mods, then let you have a copy of the script if you like.

 

I've been nobbled by Adobe Acrobat once myself

 

Andy

Posted

There are still a number if things I want to do to tidy up the code, but for anybody who wants it, please feel free.

 

<?

// Usage: require_once("security.php"); at the start of website scripts.

 

$SQL_USER="xxx";      // your sql user name

$SQL_PWD="xxx";        // your sql password

$SQL_DB="xxx";        // your sql database name

 

$fasthits =2;        // time in seconds below which hits are fast

$blockfast=6;        // number of fast hits before you block them

$blocktime=120;        // time in seconds which you block an offending IP.

$delold = 1000;      // delete log older than 1000s

 

//

//  This requires a sql database set up with 1 table (SQL_SECURITY)

//  which has 4 variables

//  SC_ID, as ID generated by SQL, although this isn't actually used

//  SC_IP, a text field to contain the IP address of anyone hitting your site

//  SC_TIME, an integer to hold the time (in seconds) of the last visit

//  SC_FA, an integer that holds the number of fast hits by the IP address

//

 

$SQL_SECURITY="SQL_SECURITY";

 

$ip = $_SERVER["REMOTE_ADDR"];

 

  $link = mysql_connect("localhost", $SQL_USER, $SQL_PWD) or die(mysql_error());

  mysql_select_db($SQL_DB) or die(mysql_error());

 

  $time = time();

$nfast=0;

  $deloldtime = $time - $delold;

 

//  Delete all old records from the file.

  $query ="DELETE FROM $SQL_SECURITY WHERE SC_TIME <= $deloldtime";

  $result = mysql_query($query) or die(mysql_error());

 

  $query= "SELECT * FROM $SQL_SECURITY WHERE (SC_IP LIKE \"$ip\")";

  $result = mysql_query($query) or die(mysql_error());

  $sql_numrows = @mysql_num_rows($result);

 

// if no hits from this IP, then just add data to log

 

  if ($sql_numrows == 0)

{

  $nfast = 0; // set fast access = 0

  $query ="INSERT INTO $SQL_SECURITY (SC_IP, SC_TIME, SC_FA) VALUES (\"$ip\", \"$time\", \"$nfast\")";

    $result = mysql_query($query) or die(mysql_error());

  }

 

//  if previous hit from this IP

 

  if ($sql_numrows != 0) 

{

    $sqlrow = @mysql_fetch_array($result);

    $lasttime = $sqlrow["SC_TIME"];

    $nfast = $sqlrow["SC_FA"];

    $block = 0;

    if ($nfast > $blockfast) $block = $lasttime - $time + $blocktime;

 

// if number of fast hits > block number then send blocking message

 

  if ($block > 0)

  {

      echo "<html>";

      echo "<head>";

      echo "<title>Access Denied</title>";

      echo "</head>";

      echo "<body>";

      echo "<b><h1>Access Denied</h1></b>";

      echo "<p><b>There have been too many rapid requests from this IP address ($ip).</b></p>";

      echo "<p><b>You must now wait a full ($block) seconds before accessing this site again.</b></p>";

      echo "</body>";

      echo "</html>";

      $query ="UPDATE $SQL_SECURITY SET SC_TIME=$time WHERE (SC_IP LIKE \"$ip\") ";

      $result = mysql_query($query) or die(mysql_error());

      mysql_close($link);

      exit();

    }

  else  // number of fast hits less than block level, so update time

  {

      if ( ($time - $lasttime) < $fasthits )

    {

    $nfast++;

    }

    else

    {

    $nfast=0;

    }

      $query ="UPDATE $SQL_SECURITY SET SC_TIME=$time, SC_FA=$nfast WHERE (SC_IP LIKE \"$ip\") ";

      $result = mysql_query($query) or die(mysql_error());

  }

  }

 

  mysql_close($link);

 

if (connection_aborted()) exit();

?>

 

You will need to set up a sql database (which you can do through Cpanel) and define the 4 fields.

 

The values at the top which you can set

$fasthits =2;        // time in seconds below which hits are fast

$blockfast=6;        // number of fast hits before you block them

$blocktime=120;        // time in seconds which you block an offending IP.

mean that if (in this case) if 6 hits in a row are all within 2 seconds of the previous hit (ie. 6 hits within 12 seconds if you like) then the user will just get a junk message until there are no hits for a 120s period.

 

Then just include it at the start of every page with

require_once("security.php");

 

I provide no promises with this code, but it seems to work for me :lol:

 

I will be updating it, and if anyone would like an updated version (or if you have ideas as to how to improve it) please email or PM me.

 

Andy

Posted

Very nice indeed. Of course I have a little problem - 17,000 of them actually.

 

Then just include it at the start of every page with

 

I have 17,000+ pages.

 

I do however have a script that can can do it for me, if I can remember where I put it :)

 

Thanks Andy

  • 7 months later...
Posted

Hi Andy,

 

I tried to copy using HTTRACK, I copied all the pages. Many of them are with Error message. Access Denied...etc.

 

Cool solution

 

Thanks a lot Andy

Posted
but for anybody who wants it, please feel free.

 

Andy,

 

thanks so much! I didn't even want to think about having to check which robots I want to block or not. This solution is sooOOOooo much easier. Thanks for the instructions too.

 

Now my question is since I use SSI for my meta tags, can I add the "require-once" code in there? Can the code be as follows:

 

require_once("http://www.mysite.com/path/to/file/security.php");

 

since the diff files are in diff folders, i'd like to use the entire URL.

 

thanks a mil,

!!blue

Posted
I have 17,000+ pages.

 

I do however have a script that can can do it for me, if I can remember where I put it :)

Hey Ty,

If you have to update every file you have, why not insert a line to PHP-include a file and then from now on all you have do to is add things to that file that are not position dependent like this script.

 

Here's a page about PHP Includes with a lot of good info to get started if you've not done it before.

Posted

Thanks, I may do that. The include line is pretty much there but it's still throwing errors like crazy. A subject I've addressed in another area of forums.

 

I've also pretty much let it ride as there appears to be no solution. :)

  • 8 months later...
Posted

Hi

 

Regarding that script. Has it been checked for whether it blocks normal spiders, ie My good mate - GoogleBot?

 

I don't want him blocked ;) he's my friend.

 

btw

 

Rock Sign

Posted

Hi,

 

If you are referring to my script - I've never had a problem with good bots. Google has never once triggered it for me. so it should be fine ;)

  • 1 year later...
Posted
I updated it slightly on my machine, but nothing that affects the operation - I'll try and put together a formal update at some stage :tchrocks:

Hi, Andy:

 

I tried implementing this on my site, then I ran httrack on it. In the files httrack downloaded (all of 'em!), I can see your Access Denied messages at the top of each page, but the rest of the html comes in fine... can you help me figure out what I'm doing wrong?

 

Thanks.

  • 3 years later...
Posted

If I have a Wordpress blog, how do I add this php code please ?

 

Hi Siverz, welcome to the forums.

 

This is an old thread, but the code still works effectively. The easiest place to add it would be at the beginning of your template, then it's automatically added to all pages.

Posted

Good to be here :)

 

Since I'm a novice, I was wondering what are the steps (bullet points is great) on how to get this working with Wordpress? Here's what I'm assuming I am supposed to do:

- copy the PHP code in a notepad and save it as Something.php

- upload to the root directory

- add require_once("Something.php"); between the <head> </head> tags

 

Is that anywhere near correct?

Posted

Good to be here :)

 

Since I'm a novice, I was wondering what are the steps (bullet points is great) on how to get this working with Wordpress? Here's what I'm assuming I am supposed to do:

- copy the PHP code in a notepad and save it as Something.php

- upload to the root directory

- add require_once("Something.php"); between the <head> </head> tags

 

Is that anywhere near correct?

 

Yes, that's pretty much correct :)

 

I would add the path into the require_once so that it becomes require_once("/home/cpuser/Something.php"); where cpuser is your cpanel username ( and assuming you have uploaded Something.php into your root directory.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...