spambayes

http://www.linuxjournal.com/article/6466?page=0,1

Components
Three classifier programs are in the Spambayes software: a procmail filter, a POP3 proxy and a plugin for Microsoft Outlook 2000. I cover the procmail filter and the POP3 proxy in this article. A web interface (covered below) and various command-line utilities, test harnesses and so on are also part of Spambayes; see the documentation that comes with the software for full details.

Procmail-Based Setup
If you use a procmail-based e-mail system, this is how the Spambayes procmail system works:

All your existing mail has a new X-Spambayes-Trained header. The software uses this to keep track of which messages it has already learned about.

The software looks at all your incoming mail. Messages it thinks are spam are put in a “spam” mail folder. Everything else is delivered normally.

Every morning, it goes through your mail folders and trains itself on any new messages. It also picks up mail that’s been refiled—something it thought was ham but was actually spam and vice versa. Be sure to keep spam in your spam folder for at least a day or two before deleting it. We suggest keeping a few hundred messages, in case you need to retrain the software.

You’ll need a working crond to set up the daily training job. Optionally, you can have a mailbox of spam and a mailbox of ham to do some initial training.

To set up Spambayes on your procmail system, begin by installing the software. I’ll assume you’ve put it in $HOME/src/spambayes. Then, create a new database:

$HOME/src/spambayes/hammiefilter.py -n
If you exercise the option to train Spambayes on your existing mail, type:

$HOME/src/spambayes/mboxtrain.py \
-d $HOME/.hammiedb -g $HOME/Mail/inbox \
-s $HOME/Mail/spam
You can add additional folder names if you like, using -g for good mail folders and -s for spam folders. Next, you need to add the following two recipes to the top of your .procmailrc file:
:0fw
| $HOME/src/spambayes/hammiefilter.py
:0

  • ^X-Spambayes-Classification: spam
    $HOME/Maildir/.spam/
    The previous recipe is for the Maildir message format. If you need mbox (the default on many systems) or MH, the second recipe should look something like this:
    :0:
  • ^X-Spambayes-Classification: spam
    $HOME/Mail/spam
    If you’re not sure what format you should use, ask your system administrator. If you are the system administrator, check the documentation of your mail program. Most modern mail programs can handle both Maildir and mbox.
    Using crontab -e, add the following cron job to train Spambayes on new or refiled messages every morning at 2:21 AM:

21 2 * $HOME/src/spambayes/mboxtrain.py -d
$HOME/.hammiedb -g $HOME/Mail/inbox
-s $HOME/Mail/spam
You also can add additional folder names here. It’s important to do this if you regularly file mail in different folders; otherwise Spambayes never learns anything about those messages.

Spambayes should now be filtering all your mail and training itself on your mailboxes. But occasionally a message is misfiled. Simply move that message to the correct folder, and Spambayes learns from its mistake the next morning.

Many thanks to Neale Pickett for the information in this section.