Bogofilter is a mail filter that classifies mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections.The statistical technique is known as the Bayesian technique and its use for spam was described by Paul Graham in his article A Plan For Spam in August 2002. Gary Robinson, in his web log Rants (September 2002), suggested some refinements for improved discrimination between spam and ham. Bogofilter's primary algorithm uses the f(w) parameter and the Fisher inverse chi-square technique that he describes. Paul Graham's new article Better Bayesian Filtering (January 2003) suggests some useful parsing improvements.
Bogofilter is run by an MDA script to classify an incoming message as spam or ham (using word lists stored by BerkeleyDB). Bogofilter provides processing for plain text and HTML. It supports multi-part mime message with decoding of base64, quoted-printable, and uuencoded text and ignores attachments, such as images. Bogofilter.Sourceforge
Getting Started
The first thing to check is whether bogofilter is installed and you can see it in your path. You can do a "which bogofilter" and if you do not see it then make sure it is installed. If not go get a package or build it from source. Once you have it make sure bogofilter is in the path that users on the system can see.Secondly, take some time and put all of your mail that you know is spam into a separate mail box. This mail box will be named "SPAM" for our example. Then put all of the mail you know is good mail, mail you want to receive in the future, into another box. We will use "archive" for good mail mail. If you have other email that you is good mail, but is in another place then we will use the mailbox "saved" for that example.
Running the job from cron
To use bogofilter in its easiest capacity you can choose to run a cron job every 15 minutes or so with the following lines. You can run these commands on one line or with line separators like we have below for easier reading.rm /home/username/.bogofilter/wordlist.db; \ bogofilter -s < ~username/Mail/SPAM; \ bogofilter -n < ~username/Mail/archive; \ bogofilter -n < ~username/Mail/savedThese lines will remove the wordlist.db database bogofilter makes and re-make the list. The argument "-s" is for mailboxes that contain know samples of spam you have received. The argument "-n" is for non-spam or good mail you have. For our example we have labeled all mail in the SPAM mailbox as spam and all good mail in "archive" and "saved" as non-spam or mail we want to recieve.
No comments:
Post a Comment