Jump to content

Tutorial On Training Spamassassin?


waveflux

Recommended Posts

SpamAssassin is an awesome utility, and it's made checking my email much less of a chore. I'd like to train SA so that less spam gets through the filter, but I haven't a clue how to begin. Everything that I've seen on the web about training SA using the sa-learn program has left me as confused and frightened as the grandfather on The Simpsons. I've read this forum's existing intro to SA, but it's not really the step-by-step guide to training SA that I think would be helpful.

 

So...any takers? It would be a great boon to me and many others.

Link to comment
Share on other sites

Spam Assassin will learn as it continues to catch emails. You can change the level from 5 to something lower if you want it to catch more messages but it could also start flagging good email as spam.

 

I know there is a way to feed it messages but I don't know how to do it. Maybe someone else here can give you some ideas.

Link to comment
Share on other sites

  • 1 month later...

I second this. :clapping: I'm very confused about this as well even after doing some research. Lately I've had alot more spam getting through even with a setting of around 4.3. Looking at the titles I don't know why SA wouldn't have flagged them as spam but they were only being scored around 1.5.

Link to comment
Share on other sites

I found more information about training SA.

 

It looks like you have to use Webmail, Outlook or any other IMAP client so you can store the emails on the server to be used to train SA. So if you are using a POP client you cannot do it.

 

I've copied this tutorial from somewhere else and have not used it myself so if it doesn't work, don't shoot the messenger, I'm only trying to help. :)

 

Assumptions

- that you know how to log in to CPanel

- that you know how to use Outlook, or know how to configure your Email client based on my descriptions of using Outlook, to add an IMAP account

- for these examples, the TotalChoice Hosting account name is tchaccount and password is tchpassword ; your domain name will be myTCHdomain.com

 

Terminology

SPAM: unsolicited Emails that you’ve received that want you to buy something or contain adult-themed references that you’d rather not get anymore.

HAM: non-spam, legitimate Emails

SA: short for SpamAssassin

 

Getting Started

Generally speaking, there are a handful of steps to follow to get this working:

1. set up IMAP folders to hold spam and ham messages

2. set up an IMAP account in Outlook if you so choose

3. set up your SA user_prefs file

4. build the training Perl script

5. Learning how to train SA

 

1. Set up IMAP folders to hold spam and ham messages

- log in to cpanel

- click on the mail icon

- click on the ’spam assassin’ link

- click on the button to ‘enable spam assassin’

- click on the button to ‘enable spam box’

- click the ‘home’ link at very top of the screen

- click on the ‘webmail’ icon

- click on the ’squirrelmail’ link

- click on the ‘folders’ link

- also in the ‘folders’ screen, towards the top, you should be able to create new folders under a heading called ‘create a folder’

- create a folder called “myham” under the subfolder of ‘none’

- create a folder called “myspam” under the subfolder of ‘none’

- click ‘refresh folder list’ on left frame again, and you should see ‘myham’ and ‘myspam’ in the list

 

2. Set up IMAP in Outlook to manage these mailboxes

This entire step is optional, and only if you want to be able to 'copy' ham messages into 'myham' to train SA as to what is legitimate Email to your accounts. SA does well if you show it a message and say "this is spam", but it keeps SA balanced and MUCH more effective/accurate if it also has a list of 'ham' messages to learn from as well. That way, if you get spam about ****, and an Email from your friend who really does take the little blue pill, SA won't think your friend is spamming you.

 

- click on Tools from the menu in Outlook

- select 'Accounts...' from the menu

- click on the "add..." button, and select "mail"

- enter the name for the account, this can be anything you want since you generally won't be sending mail from this account, click 'next'

- enter the Email account associated with this IMAP setup, again this can be anything you want, click 'next'

- select that the server type is IMAP

- incoming and outgoing servers will be myTCHdomain.com

- account login is tchaccount

- account password is tchpassword

- click next a few times and finish the setup process

- at this point, you should see a new entry above or below the "Outlook Today - Personal Folders" header, if you have the "Folder List" selected under the View menu option

- if you expand the new IMAP folder setup and you do NOT see the new 'myham' and 'myspam' folders listed, right click on the entry that says 'Inbox' and select 'create a new folder'. This should connect to LP and refresh the IMAP folders. Click cancel to go back to Outlook.

 

3. Set up SpamAssassin’s preferences file: user_prefs

SA has a configuration file that you should have accessible to you in your CPanel file manager.

 

In CPanel:

- click on the File Manager icon

- should see a folder called /.spamassassin/

- click the folder icon beside it to move into that folder

- you should have 3 files in there:

 

bayes_toks holds data about various elements it has seen from messages from previous scans; this information includes where the message came from, the route it took to get to you, when the message was sent, who it was from, who it was to, the subject line, other headers, and elements within the body of the message itself bayes_seen holds data about which messages it has looked at in the past user_prefs is the configuration file we’re going to edit

 

- click on user_prefs link to change the menu on the upper-right side of the screen, and click on ‘edit file’ from that menu

 

- here is a sample configuration file, you will need to modify a few elements in this file:

 

required_hits 5

rewrite_subject 1

subject_tag {SPAM}

bayes_path /home/ tchaccount /.spamassassin/bayes

bayes_file_mode 0600

bayes_ignore_header X-MailScanner

bayes_ignore_header X-MailScanner-SpamCheck

bayes_ignore_header X-MailScanner-SpamScore

bayes_ignore_header X-MailScanner-Information

- save the file and close the window that CPanel opened for you to edit that file

 

4. building the script

- go up one level in the file manager

- go into public_html

- go into cgi-bin

- click on the link to ‘create a new file’

- call it “sa-learn.cgi” (no quotes)

- here are the contents of the file:

 

>#!/usr/bin/perl

my $salearn = "/usr/bin/sa-learn";
$|;

print "Content-type: text/plain\n\n";

print "Learning SPAM:\n";
print `$salearn -p /home/tchaccount/.spamassassin/user_prefs --mbox --spam --showdots /home/tchaccount/mail/myspam 2>&1`;
print "\n\n";

print "Learning HAM:\n";
print `$salearn -p /home/tchaccount/.spamassassin/user_prefs --mbox --ham --showdots /home/tchaccount/mail/myham 2>&1`;
print "\n\n";

exit;

 

5. Learning how to train SpamAssassin

Since you create this CGI script in your cgi-bin folder, you can activate it from a web browser:

>http://www.myTCHdomain.com/cgi-bin/sa-learn.cgi

Edited by TCH-David: Corrected URL in step 5, and replaced curly double quotes in script code with regular quotes in step 4.

Edit 2 by TCH-David: Replaced en dashes in script code with regular dashes, added redirect of stderr ('2>&1')

Link to comment
Share on other sites

I logged in tonight to see if anyone is discussing an increase in SPAM that is getting past Spam Assassin.

 

I am getting pharmeceutical ads, adult ads, and mortgage ads that had been caught by Spam Assassin earlier.

 

They all seem to include a paragraph of nonsense. Maybe that paragraph somehow keeps the score low. Many of the spams show a score below 2.

 

Hopefully, the Spam Assassin team will study these newer forms of spam and find a way to combat them.

Link to comment
Share on other sites

  • 1 month later...

Just tagging along here as I too am getting more spam that SA used to get. I didn't know I needed to use IMAP to have SA learn. Bummer, I'm POP.

 

Any tips much appreciated. I'm a reseller and I have clients saying that spam is getting through (that didn't used to).

 

Thanks,

 

- Bradley

Link to comment
Share on other sites

  • 3 weeks later...
Step 4.  call it “sa-learn.cgi” (no quotes)
Step 5  http://www. myTCHdomain.com /cgi-bin/sa-train.cgi

You may want to edit your post.

 

>print `$salearn -p /home/tchaccount/.spamassassin/user_prefs –mbox –spam –showdots /home/tchaccount/mail/myspam`;

How does printing output to the screen process any data?

 

I have tried editing the script manually to but keep getting errors. I also get error 500 also.

Learning SPAM:

<h1>Software error:</h1>

<pre>Illegal division by zero at sa-learn.cgi line 10.

</pre>

Link to comment
Share on other sites

You may want to edit your post.

I've fixed it - thanks! :blink:

 

>print `$salearn -p /home/tchaccount/.spamassassin/user_prefs –mbox –spam –showdots /home/tchaccount/mail/myspam`;

How does printing output to the screen process any data?

The "string" being printed is enclosed by backquotes (the character left of the '1' key on your keyboard). The string in backquotes is executed as a command on the server, and the results of that command (its output) is what's printed to the browser.

 

I have tried editing the script manually to but keep getting errors. I also get error 500 also.
Learning SPAM:

<h1>Software error:</h1>

<pre>Illegal division by zero at sa-learn.cgi line 10.

</pre>

Somewhere along the way, the double quotes in the script code were converted to curly quotes, which don't work in a script. I've edited the script above and replaced the curly quotes with 'straight' double quotes. It should work a lot better now. :)

Link to comment
Share on other sites

I don't use SpamAssassin (no e-mail accounts) and have never run this script, so I don't know what kind of output you should be seeing, but yes, I think you ought to be seeing more than what you're getting.

 

The only thing I can figure is that an error is occurring with the commands in backquotes, but the error message is not being sent to the browser. I'd suggest adding '2>&1' to the end of each of the backquoted commands, so any error messages should be displayed in your browser as well:

>print `$salearn -p /home/tchaccount/.spamassassin/user_prefs –mbox –spam –showdots /home/tchaccount/mail/myspam 2>&1`;

>print `$salearn -p /home/tchaccount/.spamassassin/user_prefs –mbox –ham –showdots /home/tchaccount/mail/myham 2>&1`;

If this works for you, I'll add it to the script code in TCH-Bruce's post above.

Link to comment
Share on other sites

I found that the dashes (-) in the sa-learn command line where incorrect in TCH-Bruce's post. They were a longer ascii dash in the post, but should be a standard dash between [0] & [=].

Output is working example:

Learning SPAM:

....................

Learned (6) messages (15 examined).

 

>#!/usr/bin/perl

use CGI::Carp qw(fatalsToBrowser);
my $salearn = "/usr/bin/sa-learn";
$|;

print "Content-type: text/plain\n\n";

print "Learning SPAM:\n";
print `$salearn -p /home/tchaccount/.spamassassin/user_prefs --mbox --spam --showdots /home/tchaccount/mail/myspam 2>&1`;

print "\n\n";

print "Learning HAM:\n";
print `$salearn -p /home/tchaccount/.spamassassin/user_prefs --mbox --ham --showdots /home/tchaccount/mail/myham 2>&1`;
print "\n\n";

exit;

Link to comment
Share on other sites

Thanks David  :)

 

I did not write this originally. It was copied from another site.  So I have never used it and couldn't vouch if it were correct or not.  But others said that it worked where I found it and that's the reason I made it available here.

 

As a novice, I could use a set of clear, step-by-step instructions. Although I have not tried this yet, I did find a set of instructions that may help:

 

http://community.sjkhosting.com/t40-spam-assassin.html

 

Another site had similar instructions to those on this TCH forum:

 

http://movingparts.net/2004/12/15/training...-based-webhost/

 

Enjoy. -- Justin

Link to comment
Share on other sites

If I follow these instructions, am I 'training' SA for the mail that comes to ME?

Or am I training SA for the mail that comes to my DOMAIN?

 

It would seem to me that it's for all the mail that comes to the domain... but I just want to be sure. If not, I would set up the other mail accounts and encourage the rest of my family to work on training SA too.

 

Dan

Link to comment
Share on other sites

Is there anyone that's had luck making this work, who might be able to lend some help??

 

I've been trying and trying, and I just keep getting an "Internal Server Error".

I'm sure I'm goofing up something simple, but I don't see it...

 

Dan

dhilke,

I don't know why, but I found that I could only create & edit my cgi scripts from within cpanel's file manager. At first I was using notepad, and ftp'ing them to my cgi-bin and then chmod the script. Everytime I tried that, I got error 500. Try creating, copying & pasting in cpanel once.

Link to comment
Share on other sites

That's the way I did it. I used the cPanel to create the file, and copied the text from above in this forum into the new file right there in cPanel.

 

When you created the sa-learn.cgi file, did you use the text just as it is above, or did you have to change anything?

(Other than the four places where "tchaccount" changes to my login name?)

 

And when you changed the user_prefs file, did you change anything in the text from this post?

(Other than the one place where "tchaccount" changes to my login name?)

 

I coppied from this forum into the files in cPanel, and then changed those five things. There must be something else I missed...

 

Dan

Link to comment
Share on other sites

I get the following output:

 

ERROR: the Bayes learn function returned an error, please re-run with -D for more information

Learned from 0 message(s) (1 message(s) examined)

Is that because the myspam and myham folders are still empty?

 

When adding the -D to the script as suggested by the error message, I get the following output:

 

Learning SPAM:

debug: SpamAssassin version 3.0.4

debug: Score set 0 chosen.

debug: running in taint mode? yes

debug: Running in taint mode, removing unsafe env vars, and resetting PATH

debug: PATH included '/usr/local/bin', keeping.

debug: PATH included '/usr/bin', keeping.

debug: PATH included '/bin', keeping.

debug: Final PATH set to: /usr/local/bin:/usr/bin:/bin

debug: using "/etc/mail/spamassassin/init.pre" for site rules init.pre

debug: config: read file /etc/mail/spamassassin/init.pre

debug: using "/usr/share/spamassassin" for default rules dir

debug: config: read file /usr/share/spamassassin/10_misc.cf

debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf

debug: config: read file /usr/share/spamassassin/20_body_tests.cf

debug: config: read file /usr/share/spamassassin/20_compensate.cf

debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf

debug: config: read file /usr/share/spamassassin/20_drugs.cf

debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf

debug: config: read file /usr/share/spamassassin/20_head_tests.cf

debug: config: read file /usr/share/spamassassin/20_html_tests.cf

debug: config: read file /usr/share/spamassassin/20_meta_tests.cf

debug: config: read file /usr/share/spamassassin/20_phrases.cf

debug: config: read file /usr/share/spamassassin/20_porn.cf

debug: config: read file /usr/share/spamassassin/20_ratware.cf

debug: config: read file /usr/share/spamassassin/20_uri_tests.cf

debug: config: read file /usr/share/spamassassin/23_bayes.cf

debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf

debug: config: read file /usr/share/spamassassin/25_hashcash.cf

debug: config: read file /usr/share/spamassassin/25_spf.cf

debug: config: read file /usr/share/spamassassin/25_uribl.cf

debug: config: read file /usr/share/spamassassin/30_text_de.cf

debug: config: read file /usr/share/spamassassin/30_text_fr.cf

debug: config: read file /usr/share/spamassassin/30_text_nl.cf

debug: config: read file /usr/share/spamassassin/30_text_pl.cf

debug: config: read file /usr/share/spamassassin/50_scores.cf

debug: config: read file /usr/share/spamassassin/60_whitelist.cf

debug: using "/etc/mail/spamassassin" for site rules dir

debug: config: read file /etc/mail/spamassassin/local.cf

debug: using "/home/tchaccount/.spamassassin/user_prefs" for user prefs file

debug: config: read file /home/tchaccount/.spamassassin/user_prefs

debug: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC

debug: plugin: registered Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x849dfa4)

debug: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC

debug: plugin: registered Mail::SpamAssassin::Plugin::Hashcash=HASH(0x8bd82c8)

debug: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC

debug: plugin: registered Mail::SpamAssassin::Plugin::SPF=HASH(0x8bb5258)

debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x849dfa4) implements 'parse_config'

debug: plugin: Mail::SpamAssassin::Plugin::Hashcash=HASH(0x8bd82c8) implements 'parse_config'

debug: config: SpamAssassin failed to parse line, skipping: rewrite_subject 0

debug: config: SpamAssassin failed to parse line, skipping: subject_tag *****SPAM*****

debug: config: SpamAssassin failed to parse line, skipping: use_terse_report 0

debug: config: SpamAssassin failed to parse line, skipping: auto_learn 1

debug: bayes: DB_File module not installed, cannot use Bayes

debug: Score set 0 chosen.

debug: Initialising learner

debug: Syncing Bayes and expiring old tokens...

debug: bayes: DB_File module not installed, cannot use Bayes

debug: bayes: DB_File module not installed, cannot use Bayes

debug: Syncing complete.

debug: Learning Spam

debug: metadata: X-Spam-Relays-Trusted:

debug: metadata: X-Spam-Relays-Untrusted:

debug: ---- MIME PARSER START ----

debug: main message type: text/plain

debug: parsing normal part

debug: added part, type: text/plain

debug: ---- MIME PARSER END ----

debug: decoding: no encoding detected

debug: Loading languages file...

debug: Language possibly: en

debug: metadata: X-Languages: en

debug: bayes: DB_File module not installed, cannot use Bayes

 

ERROR: the Bayes learn function returned an error, please re-run with -D for more information

Learned from 0 message(s) (1 message(s) examined).

Link to comment
Share on other sites

  • 1 year later...

I realize this thread has not been updated in over a year. This information needs to be upgraded from "suggestions" to an actual working tutorial. Its not clear from going all the way through the thread if this method with the "sa-learn.cgi" actually works for anyone. It doesn't work for me - I get a server misconfiguration error.

 

I'm not an expert on perl, but I think the problem may be that there is no /usr/ directory for my account. It looks like the cgi script is trying to go to the spam assassin executable, and I can't tell where that is.

 

What do these lines do:

my $salearn = "/usr/bin/sa-learn";

$|;

 

Regards, Andy

Link to comment
Share on other sites

Welcome to the forum Andy

 

my $salearn = "/usr/bin/sa-learn";
Sets the variable $salearn to where the sa-learn program is on the server in 'usr/bin'

 

$|;

I am unsure on what this is. As I stated in the original post I did not write the script only copied it from somewhere else which I no longer remember where.

Link to comment
Share on other sites

  • 4 weeks later...

I recently had a problem with trying to get spam assassin working on my own email account and turned it back on. It still didn't catch the spam, none at all. After discussing with Mikem we tried something to see if it would resolve it and it appears to have done.

 

What I did, was rename the .spamassassin directory to .sabackup after turning both spam assassin and the sapm assassin box off.

 

Then re-enabled both of spam assassin options again. Leaving all the options blank except turning the level from default of 5 to 2 turned spam assassin back on again. Now instead of heaps of spam getting to my in box I get maybe 5 a day and use mailwasher to deal with those.

 

Life is good.

 

I then check using Horde once a day for emails that might have been trapped incorrectly add them to the white list and delete all the rest.

 

Maybe try that and see if that works for you.

Link to comment
Share on other sites

  • 6 months later...

The script I ended up with was

 

#!/usr/bin/perl

 

my $salearn = "/usr/bin/sa-learn";

$|;

 

print "Content-type: text/plain\n\n";

 

print "Learning SPAM:\n";

print `$salearn -p /home/mylogin/.spamassassin/user_prefs --mbox --spam --showdots /home/mylogin/mail/myspam 2>&1`;

print "\n\n";

 

print "Learning HAM:\n";

print `$salearn -p /home/mylogin/.spamassassin/user_prefs --mbox --ham --showdots /home/mylogin/mail/myham 2>&1`;

print "\n\n";

 

exit;

 

 

But with the mail changes from a couple weeks ago, should /home/mylogin/mail/myspam be changed to something like /home/mylogin/mail/.myspam/cur (or the applicable per-user box) now? Nothing seems to work - tells me zero messages. Is there hope of it working with the one-file-per-message format in the 'cur' directory?

Link to comment
Share on other sites

Hi,

 

You will need to change a couple of things;

 

The --mbox to --maildir

The path to the correct directory

 

For example for the ham you could have

 

>print `$salearn -p /home/mylogin/.spamassassin/user_prefs --maildir --ham --showdots /home/mylogin/mail/damain/user/{cur,new} 2>&1`;

 

the {cur,new} allows you to look in both directories.

 

If you want to do it in all your email accounts you can also use a wildcard instead of the 'user'

Link to comment
Share on other sites

  • 1 month later...

I've got sub-folders for "learn_ham" and "learn_spam" -- how do I get the new mail stuff to recognize those in conjunction with the sa-learn script? I think I'm in the same boat as waynej; I see how Andy's suggestion would work for ham, but I don't follow how that would work for learning spam from stuff that SA doesn't catch by default.

 

Any help? :)

 

Edit: I think I may have figured it out. It looks like you can just add the subdirectory to the path. So, for instance, instead of

>/home/mylogin/mail/domain/user/{cur,new}

 

You can do

>/home/mylogin/mail/domain/user/.learn_spam/cur

 

Is that valid thinking, or am I missing something?

Edited by McC
Link to comment
Share on other sites

  • 3 weeks later...
Yes, you're correct.

 

>print `$salearn -p /home/mylogin/.spamassassin/user_prefs  --spam --showdots /home/mylogin/mail/damain/user/.learn_spm/cur 2>&1`;

 

I must be continuing to do something wrong, because it's still reporting 0s. Here's the full text of my script (with user-sensitive stuff replaced by ****).

 

>#!/usr/bin/perl

use CGI::Carp qw(fatalsToBrowser);
my $salearn = "/usr/bin/sa-learn";
$|;

print "Content-type: text/plain\n\n";

print "Learning SPAM:\n";
print `$salearn -p /home/****/.spamassassin/user_prefs --mbox --spam --showdots /home/****/mail/****/****/.learn_spam/cur 2>&1`;
print "\n";
print `$salearn -p /home/****/.spamassassin/user_prefs --mbox --spam --showdots /home/****/mail/****/****/.learn_spam/cur 2>&1`;
print "\n\n";

print "Learning HAM:\n";
print `$salearn -p /home/****/.spamassassin/user_prefs --mbox --ham --showdots /home/****/mail/****/****/.learn_ham/cur 2>&1`;
print "\n";
print `$salearn -p /home/****/.spamassassin/user_prefs --mbox --ham --showdots /home/****/mail/****/****/.learn_ham/cur 2>&1`;
print "\n\n";

exit;

The reason there are two lines for each is that I'm running it on two mailboxes.

 

The only difference I can see is the --mbox flag. Should that not be present? Is there something else I'm missing?

Link to comment
Share on other sites

  • 3 years later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...