Jump to content

What's Your Bandwidth Usage Like?


Recommended Posts

This is awful. I've noticed significant jumps in bandwidth consumption in the last month or two. This is what Awstats shows now:

 

Traffic viewed = 2.37 GB

 

Traffic not viewed = 14.29 GB :tchrocks:

 

As Awstats tells us, "not viewed" includes "traffic generated by robots, worms, or replies with special HTTP status codes."

 

Sadly, as of this writing, I've exceeded my alloted bandwidth :tchrocks: I run a "modest" personal site, both in terms of scale and popularity.

 

 

Some possible reasons:

 

* It looks like most of the traffic can be blamed on busy/nosy search engine bots (some consuming several gigabytes per visit -- that ain't right).

 

* I host a few blogs. These get spammed a lot (vast majority of ads are caught by filters, though, not that this is relevant).

 

* I run a web feed aggregator with some 100 blog RSS feeds, etc. Until recently the feed cache had not been configured correctly, so maybe this could account for some usage (not sure how much).

 

* I used to host a calendar that ran into infinity but quite a while I marked it "disallowed" in my robots file, so that should not be an issue (rogue bots not withstanding).

 

 

:) Is your bandwidth anything near this ridiculous proportion (mostly non-human traffic)? What do you do about it? Any suggestions? Thanks.

Edited by a__kc
Link to post
Share on other sites

Andy, thanks a lot for the offer. I'll do that :) (Yeah, TCH rulez!) Fact is, I've poured over the awstats summaries quite a few times, and I still have a lot to learn on figuring out how to read logs.

 

Btw, thanks to Bruce for moving the post here, where it belongs.

Link to post
Share on other sites

OK, I thought I'd give a brief update on the situation, especially regarding search engine bots. The top hits I received for March were:

 

Unknown robot (identified by 'spider') 6.66 GB (mostly Baidu)

Googlebot 124063+182 6.02 GB (Google)

Inktomi Slurp 114994+6645 1.73 GB (Yahoo)

MSNBot 4433+679 155.22 MB (MSN)

 

Clearly the first two entries sucked up more than half of my alloted bandwidth for the month. The first entry, I discovered, can be attributed mostly to the Chinese Baidu SE. Given the nature of my contents I was reluctant to ban outright both Baidu and Google in spite of their ridiculous appetite. So I added a couple more disallowed sections of my site, notably an aggregator with lots of interconnected and dynamically generated links.

 

The funny thing is, all major bots obeyed the new rules within hours. Except Baidu. Okay, so maybe Baidu needed more time to re-analyze my robots file. So I gave it a few more days. And it kept coming and coming. So I disallowed it from visiting root. And it still kept on coming (like the damn Energizer -- or is it Duracel? -- rabbit). By this time I've had enough. I banned Baidu's IP range. You'd think it'd have given up. Nope, it still visited! The good news is it would no longer grab 6 GB per month. And I think this should keep the bandwidth in check for the immediate future (we'll see).

 

The take-home lesson: ban Baidu! :) I don't care if it's China's largest and best SE. It's evil.

Link to post
Share on other sites
The top hits I received for March were:

 

Unknown robot (identified by 'spider') 6.66 GB (mostly Baidu)

Googlebot 124063+182 6.02 GB (Google)

Inktomi Slurp 114994+6645 1.73 GB (Yahoo)

MSNBot 4433+679 155.22 MB (MSN)

 

Clearly the first two entries sucked up more than half of my alloted bandwidth for the month. The first entry, I discovered, can be attributed mostly to the Chinese Baidu SE. Given the nature of my contents I was reluctant to ban outright both Baidu and Google in spite of their ridiculous appetite.

 

You can also add a Google Sitemap to your website to cut down on Google crawler traffic. Basically YOU tell Google what pages or sections of your site are new and Google only (mostly) pulls what that. It still does a complete crawl about once a month though.

It took about a week before I started to see a difference on my site but the Google traffic dropped to half by the next month. (Below both MSN and Slurp) Plus I now have a lot better idea of how users get to my site through Google AND I can see how the Google crawler views my site. Of course your results may vary. :)

Check it out.

 

http:://www.google.com/webmasters or

http://www.google.com/webmasters/sitemaps

Link to post
Share on other sites
You can also add a Google Sitemap to your website to cut down on Google crawler traffic. Basically YOU tell Google what pages or sections of your site are new and Google only (mostly) pulls what that. It still does a complete crawl about once a month though.

It took about a week before I started to see a difference on my site but the Google traffic dropped to half by the next month. (Below both MSN and Slurp) Plus I now have a lot better idea of how users get to my site through Google AND I can see how the Google crawler views my site. Of course your results may vary. :)

Check it out.

 

http:://www.google.com/webmasters or

http://www.google.com/webmasters/sitemaps

 

Link does not work, you need to add the .com after google

Link to post
Share on other sites

I must be getting off easy:

 

raffic viewed * 196 386(1.96 visits/visitor) 286 (7.41 pages/visit) 11636(30.14 hits/visit) 128.95 MB (342.08 KB/visit)

Traffic not viewed * 2343 8322 26.28 MB

Link to post
Share on other sites

Kevan, thanks for the tip. I'll take a look at Google Sitemaps -- it looks promising and I must say I'm happy some search engines are trying to refine the old-fashioned "brute force" approach.

 

----

 

timhodge, don't want to get too political here, but I'm pretty sure Baidu censors some of the info it finds (e.g. Tibet, Falungong, Taiwan independence stuff -- and p0rn :goof: ) in terms of hiding them from view AND ranking pro-government stuff higher. The law demands it.

 

PS: Okay, please don't ask me what constitutes "pro-government p0rn" :lol:

Edited by a__kc
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...