Jump to content

a__kc

Members
  • Posts

    30
  • Joined

  • Last visited

a__kc's Achievements

Explorer

Explorer (4/14)

  • First Post
  • Collaborator
  • Conversation Starter
  • Week One Done
  • One Month Later

Recent Badges

0

Reputation

  1. I didn't know, either. Excellent information
  2. Er, I wish I had reviewed the following MySQL reference page first (http://dev.mysql.com/doc/refman/4.1/en/cha...conversion.html). I think (not 100% sure) I corrupted some data just by using PhpMyAdmin to change collation. This was unheard of with MySQL 4.0!
  3. Same problem here, too. I guess TCH has upgraded MySQL to a version that supports collation? Unfortunately all of my previously collation-less data now have been (mistakenly) labeled "latin1_swedish_c" (apparently rather arbitarily since MySQL is a Swedish company). This would all be academic except that I can no longer view or properly manipulate the data via PhpMyAdmin. It's all scrambled and I hope this is only a presentation issue and not indicative of data corruption. Problem with "latin1_swedish_c": 1. My data are definitely not Latin1 encoded (they're UTF-8, to support English, Japanese, Chinese, and all sorts of multilingual data); 2. Not Swedish (but collation really doesn't matter to my apps); 3. Manually setting each field to, say, "utf8 byte" or "utf8 general_ci" has not made the problem go away. Worse, now the part of the site that pulled out the data now displays scrambed text. Apparently the data are now really treated as Latin! If I come up with anything, I'll let you know. Sigh....
  4. Kevan, thanks for the tip. I'll take a look at Google Sitemaps -- it looks promising and I must say I'm happy some search engines are trying to refine the old-fashioned "brute force" approach. ---- timhodge, don't want to get too political here, but I'm pretty sure Baidu censors some of the info it finds (e.g. Tibet, Falungong, Taiwan independence stuff -- and p0rn ) in terms of hiding them from view AND ranking pro-government stuff higher. The law demands it. PS: Okay, please don't ask me what constitutes "pro-government p0rn"
  5. OK, I thought I'd give a brief update on the situation, especially regarding search engine bots. The top hits I received for March were: Unknown robot (identified by 'spider') 6.66 GB (mostly Baidu) Googlebot 124063+182 6.02 GB (Google) Inktomi Slurp 114994+6645 1.73 GB (Yahoo) MSNBot 4433+679 155.22 MB (MSN) Clearly the first two entries sucked up more than half of my alloted bandwidth for the month. The first entry, I discovered, can be attributed mostly to the Chinese Baidu SE. Given the nature of my contents I was reluctant to ban outright both Baidu and Google in spite of their ridiculous appetite. So I added a couple more disallowed sections of my site, notably an aggregator with lots of interconnected and dynamically generated links. The funny thing is, all major bots obeyed the new rules within hours. Except Baidu. Okay, so maybe Baidu needed more time to re-analyze my robots file. So I gave it a few more days. And it kept coming and coming. So I disallowed it from visiting root. And it still kept on coming (like the damn Energizer -- or is it Duracel? -- rabbit). By this time I've had enough. I banned Baidu's IP range. You'd think it'd have given up. Nope, it still visited! The good news is it would no longer grab 6 GB per month. And I think this should keep the bandwidth in check for the immediate future (we'll see). The take-home lesson: ban Baidu! I don't care if it's China's largest and best SE. It's evil.
  6. Andy, thanks a lot for the offer. I'll do that (Yeah, TCH rulez!) Fact is, I've poured over the awstats summaries quite a few times, and I still have a lot to learn on figuring out how to read logs. Btw, thanks to Bruce for moving the post here, where it belongs.
  7. This is awful. I've noticed significant jumps in bandwidth consumption in the last month or two. This is what Awstats shows now: Traffic viewed = 2.37 GB Traffic not viewed = 14.29 GB As Awstats tells us, "not viewed" includes "traffic generated by robots, worms, or replies with special HTTP status codes." Sadly, as of this writing, I've exceeded my alloted bandwidth I run a "modest" personal site, both in terms of scale and popularity. Some possible reasons: * It looks like most of the traffic can be blamed on busy/nosy search engine bots (some consuming several gigabytes per visit -- that ain't right). * I host a few blogs. These get spammed a lot (vast majority of ads are caught by filters, though, not that this is relevant). * I run a web feed aggregator with some 100 blog RSS feeds, etc. Until recently the feed cache had not been configured correctly, so maybe this could account for some usage (not sure how much). * I used to host a calendar that ran into infinity but quite a while I marked it "disallowed" in my robots file, so that should not be an issue (rogue bots not withstanding). Is your bandwidth anything near this ridiculous proportion (mostly non-human traffic)? What do you do about it? Any suggestions? Thanks.
  8. I renamed entry and comment-related scripts. Worked for a while, until the spammers caught on. Later on I upgraded to Lazarus (AGB's cousin), which uses a user-defined question to test against bots. Alas, the spammers took pains to answer the question, so I still got spam. i.e. Against human-delivered spam I am still at a loss as to how best to deflect them. I checked to see if these spammers are using blacklisted open proxies (which would make it possible to block posts coming from them), well, apparently not I may end up having to approve all posts manually, at least until the human spammers give up.
  9. Just want to add that wget -q -t 2 --delete-after http://mysite.net/script.php does not appear to work (at least on my server). You get a /bin/sh: line 1: /usr/bin/wget: Permission denied Same thing with using Lynx the text-based browser (this used to work): /bin/sh: line 1: /usr/bin/lynx: Permission denied
  10. Today I saw 406 errors for the first time, quite surprising. The messages claim that one or another "resource" (name of script indicated) could not be found on the server. This seems to occur only when posting forms but only with some contents and not others. I've been able to use a set of anti-spam tools quite effectively, including one I coded. I also upgraded to MT 3.15 yesterday, though it sounds like it's a server-side thing. I might re-install but doubt that would help. Edited: I've opened a ticket. Please regard this post as "experience-sharing".
  11. Maybe it's a server-level configuration? Frankly I know little about Apache directives (just enough RewriteRule to get by). It would be useful, though, to be able to make some HTML files behave like .shtml. Otherwise I'd need to change filenames and set up redirects. Update: "XBitHack on" works fine for me (hurray!!). This is a lot better than forcing all .html files to be parsed
  12. I doubt it. Unless your script is using the Encode module, that should have no effect at all. BTW, to answer your last question, TCH (at least the server I'm on) is using Perl 5.8.0. While not having the latest bug fixes (and new bugs), it should be more than adequate. About "use utf8": Perl's official doc says: I no longer remember if I tried that, though I bet I have. As I indicated before, the utf-8 encoded scripts ran fine on XP (or my particular XP Perl), so...that's that. Maybe try again another time. Thanks for all the suggestions
  13. Thanks, guys, for the extra info. That injection statement looks weird but then I don't go around cracking people's sites. Since I did upgrade AG a few weeks ago to close off that hideous loophole, that's good. Hope everyone else here has done the upgrade, as well.
  14. Hi, Recently I've received two email apparently from my installation of Advanced Guestbook (2.3.1) telling me about SQL errors. My guestbook is apparently unaffected (so far?) but I'm concerned about some kind of script attack. I hope this is just some failed attempt at running a spam script rather than a security compromise. I looked up the IPs, one's from Germany, another Italy. My guess is they've been grabbed from anonymous proxies -- I don't really know. What do you think? Anyone with similar experience? Should I ignore this or...? Thanks. -------- Original Message -------- Subject: Guestbook - Error Date: Sun, 27 Jun 2004 17:19:43 -0700 From: mydb_agbook1@localhost MySQL Error : Query Error Error Number: 1064 You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'http://217.59.104.226/, 10' at line 1 Date : Sun, June 27, 2004 17:19:43 IP : 81.74.252.73 Browser : curl/7.9.5 (i586-pc-linux-gnu) libcurl 7.9.5 (ipv6 enabled) Referer : PHP Version : 4.3.7 OS : Linux Server : Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.7 FrontPage/5.0.2.2634a mod_ssl/2.8.18 OpenSSL/0.9.6b Server Name : www.site.com .
  15. Hmm...I think my description of the problem with the unforwarded mail was possibly inaccurate: it seems that Mailman 2.1.2 (the current installed version) refuses to distribute mail whose subject line uses certain non-US ASCII encoding, for example Traditional Chinese (aka Big5). No warning is given and the guilty mail is apparently discarded. So possibly Spam Assassin is not responsible for the problem. Still, I'd appreciate any info as to whether our favorite assassin targets mail headed for list distribution -- thanks!
×
×
  • Create New...