Online Spam Databases

While SpamAssassin has hundreds of pattern-based rules that help it determine whether an email is spam or not, this is not always enough to get the job done. Sometimes it takes the extra confidence that only another pair of human eyes--or a few hundred thousand pairs of eyes--can provide. That's where online spam databases come in.

An online spam database works a bit like a public notice board where people from across the Internet can say, "here's an email I received, and I think it's spam." When others receive the same email, they can check the database and see how many other people thought that item was spam. If hundreds of thousands of other people thought it was spam, chances are pretty good that you'll think so too, so you can score the email accordingly.

The main advantage to using online spam databases is that they provide a kind of "early warning system" for new types of spam that SpamAssassin may not have enough rules to identify properly. Remember, spam gets sent out to millions of people, so chances are pretty good that someone else has received the same item before you did, so by the time you check the online database there's probably already an opinion about it.

The downside to using these kinds of databases is that they're network-based tests, and it takes some time to query a remote database and get a response from the server. This can add a few extra seconds to the processing time for each mail item, while SpamAssassin waits for responses. On the whole, though, the delays are worth it, since the scoring benefits of the extra thoroughness pays off.

Vipul's Razor (Razor2)

Vipul's Razor is a popular and well-supported spam database operated by Cloudmark. To use it, you first need to download the latest razor-agents package and install it according to the supplied instructions. Don't forget to use the registration tool to create a reporter account for your site, as explained in the documentation, since the Razor servers only allow registered reporters to connect.

Next, you need to load the Razor2 plugin for SpamAssassin in your v310.pre file:

loadplugin Mail::SpamAssassin::Plugin::Razor2

Then in your local.cf file you enable and configure the Razor2 plugin:

use_razor2      1
razor_timeout   10

Once everything is installed and configured, restart amavisd-maia to start using the Razor2 plugin.

Pyzor

Pyzor was an attempt to write an open source equivalent to Vipul's Razor in Python (hence "Pyzor"). It serves much the same purpose as Razor2, and while it is not quite as advanced technologically, the client and server code is both open. In practical terms this means you can run your own Pyzor server, whereas Cloudmark only releases the client for Razor2.

Since it lacks the commercial support that Razor2 has, Pyzor has been plagued over the years by server outages, since many of its servers are operated on a volunteer basis by organizations willing to let others use their own hosted Pyzor servers. This somewhat inconsistent support has led some people to stop using Pyzor, which is a bit of a shame because Pyzor does not overlap with Razor2 as much as one might think. Some organizations, for instance, cannot use Razor2 for license reasons--their mail volume may be too high to use Razor2 without purchasing an expensive commercial license, so they report to Pyzor instead. Others choose Pyzor for ideological reasons (i.e. to support free software rather than Razor2's proprietary and commercial model), and will not simply refuse to report spam to Razor2. As a result, Pyzor's database contains reports from people who may not be reporting to Razor2 at all, making it worthwhile to consult Pyzor in addition to Razor2.

To use Pyzor, download it and apply these patches to the source code before building it. Without the patches, Pyzor is effectively useless, so don't forget this step!

Next, load the Pyzor plugin in your v310.pre file:

loadplugin Mail::SpamAssassin::Plugin::Pyzor

In your local.cf file, enable and configure the Pyzor plugin:

use_pyzor       1
pyzor_timeout   10
pyzor_max       5
pyzor_path      /usr/bin/pyzor

To get a current list of Pyzor servers, run "pyzor discover" as your amavis/maia user. This should create a file in that user's home directory called ~/.pyzor/servers that contains IP addresses and port numbers of the server(s) Pyzor should consult.

Once everything is installed and configured, restart amavisd-maia to start using the Pyzor plugin.

The Distributed Checksum Clearinghouse (DCC)

The Distributed Checksum Clearinghouse (DCC) is a database designed and maintained by Rhyolite Software that keeps track of counts of mail items, whether spam or not, and as such it is quite useful as an indicator of "bulk" email. Being listed in the DCC, in other words, does not necessarily mean the email is spam, it just means that a whole lot of people have received it. This could be true of legitimate newsletters sent out in bulk, but you get to set a threshold beyond which you consider the mail to be "spam"--a big mailing list might send out a few thousand copies of an email, for instance, but a million copies is suspicious!

As with Pyzor, you can host your own DCC server if you like, but the license terms are designed to encourage you to "give back" by sending your reports to Rhyolite's own DCC servers, so that the rest of the community can benefit from them. Most sites will only be interested in the DCC client, which you can freely download and build from source. In a Maia context, the only part of the DCC tool suite you'll want to build is dccifd, the DCC interface daemon. SpamAssassin can then query that daemon via a UNIX socket.

To configure DCC, edit your /var/dcc/dcc_conf file with these particulars:

...
# DCC user name
DCCUID=amavis
...
DCCD_ENABLE=off
...
GREY_ENABLE=off
...
DCCM_ENABLE=off
...
DCCIFD_ENABLE=on
...

When you've built and configured dccifd, load the SpamAssassin plugin in your v310.pre file:

loadplugin Mail::SpamAssassin::Plugin::DCC

In your local.cf file, enable the DCC plugin and configure it:

use_dcc           1
dcc_timeout       10
dcc_body_max      999999
dcc_fuz1_max      999999
dcc_fuz2_max      999999
dcc_home          /var/dcc
dcc_dccifd_path   /var/dcc/dccifd

Since dccifd needs to be running at all times, add a line to one of your startup scripts (e.g. rc.local):

# Start the DCC daemon
su - amavis -c '/var/dcc/libexec/dccifd'

Once dccifd is running, you can restart amavisd-maia to start using the DCC plugin.

iXhash

iXhash is something of a cross between Razor2 and Pyzor in that it is open source and includes server code, but also breaks the data down into three source categories, each of which can be scored independently. Unlike Razor2, Pyzor, or DCC however, iXhash does not provide a mechanism for reporting spam to a central server, so it's essentially a read-only service unless you host your own server.

First, download the tarball, which contains the SpamAssassin plugin (iXhash.pm) and its configuration file (iXhash.cf). Install the plugin in your SpamAssassin plugin directory, e.g. /etc/mail/spamassassin/plugins, and put the configuration file in the directory where your local.cf file is located.

In your v310.pre file, add:

loadplugin Mail::SpamAssassin::Plugin::iXhash  /etc/mail/spamassassin/plugins/iXhash.pm

In the iXhash.cf file, delete (or comment out) the "loadplugin" line at the top, since it properly belongs in the v310.pre file.

Now run the load-sa-rules.pl script to make sure that Maia is aware of the new rules in the iXhash.cf file. Restart amavisd-maia to start using this plugin.


Back to FAQ