wiki:BayesAutoLearn

Version 1 (modified by rjl, 13 years ago) (diff)

--

The Bayes Auto-Learning Mechanism

SpamAssassin offers a useful feature called "auto-learning" to help keep its Bayes database properly trained with minimal effort on the parts of your users. It works by defining two thresholds:

bayes_auto_learn_threshold_nonspam defines the score below which SpamAssassin can conservatively decide that mail is non-spam, even without confirmation from the recipient.

Similarly, bayes_auto_learn_threshold_spam defines the score above which SpamAssassin can conservatively treat mail as spam, without needing confirmation from the recipient.

The key is to choose these two thresholds conservatively. Mail that scores somewhere between the two thresholds will not be automatically learned one way or the other--it will still require human confirmation in order to be learned by the Bayes database. In your local.cf file:

# Enable the auto-learning mechanism
bayes_auto_learn                       1
bayes_auto_learn_threshold_nonspam     -0.001
bayes_auto_learn_threshold_spam        10.0

The main advantage to auto-learning is that it takes place at the time the mail is scanned by SpamAssassin, so it can be applied to all the mail your site receives, even if your users neglect to confirm any of the items in their quarantines and caches. Items that score conservatively high or low enough will thus still contribute to the education of your Bayes database, even if users don't bother confirming the stuff that falls into the "grey" area in between those thresholds.

The disadvantage to auto-learning is that it adds a bit more processing at the time the mail is scanned, so the processing time for each email increases a little. If items are taking too long to scan, it might make sense to disable auto-learning and rely instead on user confirmation exclusively for your Bayes training.

Note: If you want to disable the auto-learning mechanism, you need to do more than just set bayes_auto_learn to 0; you must also delete (or comment-out) the bayes_auto_learn_threshold_nonspam and bayes_auto_learn_threshold_spam lines, otherwise SpamAssassin will infer from their presence that you really want auto-learning enabled:

# Disable the auto-learning mechanism
bayes_auto_learn                       0
#bayes_auto_learn_threshold_nonspam    -0.001
#bayes_auto_learn_threshold_spam       10.0

On balance, it's usually a good idea to enable auto-learning if a large proportion of your users are lazy or neglectful about confirming the items in their quarantines and caches. On the other hand, if most of your users are diligent about managing their quarantines and caches regularly, you can probably keep your Bayes database well-tuned with user confirmations alone, and don't need auto-learning.


Back to FAQ