Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally written by Eric S. Raymond after he read Paul Graham's article A Plan for Spam and is now maintained together with a group of contributors by David Relson, Matthias Andree and Greg Louis.
The statistical technique used is known as Bayesian filtering. Bogofilter's primary algorithm uses the f(w) parameter and the Fisher inverse chi-square technique that he describes.
Bogofilter may be run by a MDA or mail client to classify messages as they are delivered to recipient mailboxes, or be used by a MTA to classify messages as they are received from the sending SMTP server. Bogofilter examines tokens in the message body and header, and refers to wordlists stored by BerkeleyDB, SQLite or QDBM to calculate a probability score that a new message is spam. Bogofilter provides processing for plain text and HTML and supports reading multi-part MIME message including base64, quoted-printable, and uuencoded text or HTML. Bogofilter ignores non-text attachments, such as images.
It is possible to tune Bogofilter's statistical algorithms by modifying various coefficients and other settings in its configuration file, or by using the automated bogotune utility included with the software, which attempts to optimise various coefficients to maximise filtering efficiency for a particular corpus of spam and non-spam.
- Official homepage
- "Bogofilter". Freecode.
- A Plan for Spam – An essay by Paul Graham discussing the main ideas behind this program
This article, or an earlier revision of it, was edited from bogofilter's homepage.