Maia: The Big Picture

Following a few web searches for "Maia Mailguard", it's interesting to see what people think this package is/does. Some people think Maia is simply a GUI, a pretty front-end for amavis. Others seem to think that Maia's major contribution to the world of spam-fighting is its ability to give users control over their content-filtering settings. That's pretty disappointing, really; it shows just how "misunderstood" poor Maia really is. In case any of you on this list may be labouring under these misconceptions, I thought I'd take a little time to explain the "big picture" of what Maia's purpose in life is, and where I see the Maia project going.

(I was hoping to give a talk at the upcoming Spam Conference at MIT on the 16th of January, but since I can't afford the trip this year, I figure I might as well offer my essay here instead :)

I admit that when I first started the Maia Mailguard project, my aim was simple--I wanted a web-based front-end for amavis. A lot of people on the amavis-users list were asking about a PHP-based interface for the per-user SQL settings that Mark Martinec had conveniently provided in amavisd-new-20030616, and since I wanted to write such a thing for my own purposes, I thought I'd release my work for others to use and modify to their liking.

Along the way, however, I started to get some ideas about "quarantine management"--the idea that end-users should be able to access and manage their own quarantined mail. At the time, this was a big issue in the spam-fighting community, because while mail filters were getting better and better at blocking spam, the fear of false positives was high--people were terrified of losing mail, and a filter system that discarded blocked mail was very hard for some organizations to accept.

The trouble with the early quarantine systems people were using was that they tended to dump all quarantined items into a system mailbox somewhere, for the mail administrator to sift through, in case of emergencies. Often the admins in these cases had such contempt for their "lusers" that they didn't think their users could be trusted to manage their own quarantines--"it's too complicated, they won't be bothered to use such facilities," etc.. This made for a messy arrangement that introduced serious privacy issues, forcing the mail administrator to poke his nose through users' quarantined mail items to find false positives when users complained about missing mail items.

With Maia, I had hoped to introduce a user-friendly way to put quarantine management where it belonged--in the hands of the end-users. Still, one of the first requests I got from Maia users was "how can I configure Maia so that all the quarantined mail ends up in one mailbox for me to manage?" These admins were still not getting it, they were trying to use this new tool to do things "the old way". The whole concept of quarantine management was lost on these people, who just couldn't wrap their brains around the idea of letting end-users do this for themselves.

Even now, with Maia 1.0.0, some admins are still determined to try to use it to support their old ways of doing things. A classic example of this is the amavis system of "tag levels", where it's possible to define a score level at which an item is declared to be "spam", and a higher score level at which the item is actually quarantined. Having these two thresholds as separate items was important in the days before Maia, when the "kill level" often meant discarding the mail item, and rendering it inaccessible to the recipient. With quarantine management, on the other hand, it becomes safe and practical to set these two thresholds to the same value, because no mail is ever lost or rendered inaccessible to the recipient. If it's declared to be "spam", let it be quarantined--that's the whole *purpose* of quarantines, after all, particularly when the user can visit that quarantine at any time to rescue anything he may want. This notion of setting the spam threshold at 5.0 and the quarantine threshold at 10.0 is outdated--an obsolete way of thinking, if you're using a tool like Maia that provides quarantine management.

Another major gripe that I have with a lot of mail admins is the fact that they seem to think of mail filters as tools that are only necessary for shielding their own networks from outside sources. Time and time again on the amavis-users list, some admin is asking how he can set things up so that only *inbound* mail gets filtered--he presumes, of course, that none of his own users will ever send a virus-infected item, and that none of them happens to be a spammer, so why waste resources checking this outbound mail? Never mind the fact that any machine anywhere could become the origination point for spam, if it should be compromised by a virus like Sobig-F, or by hacker/spammers exploiting security vulnerabilities in that machine. This is a "fence-out" philosophy--the idea that sites need to shield themselves against attacks from outside. The converse philosophy is called "fence-in"--the idea that you should prevent such attacks from *originating* on your network, by blocking them before they can leave your network. I believe that administrators should be doing both--guarding against attacks from outside, and preventing attacks from within. If more ISPs took this kind of "fence-in" approach, the Internet would be a better place.

With most sites adhering only to a "fence-out" philosophy, we also see the "selfish" use of mail filtering technology. These sites use rules developed by others, and consult collaborative databases like DCC/Razor/Pyzor to cross-check the mail they receive, but they seldom give anything back to the spam-fighting community. They usually just discard the spam they receive, without reporting it anywhere or doing anything productive with it. They typically don't take any action that would actually help in the war on spam--they're only concerned about defending themselves against the onslaught.

One of Maia's key strengths is its built-in reporting capabilities. Since users are "confirming" the status of the mail in their quarantines and ham caches, Maia not only learns faster and more effectively on her own, she also has the ability to send reports to the very collaborative databases that she makes use of--DCC/Razor/Pyzor. This helps other sites detect spam faster and more effectively, and "gives back" to the community. The fact that this can all be automated with Maia is a major benefit--the usual excuses about reporting being "too confusing/complicated/time-consuming" all go away when it's as simple as scheduling a cron job and letting users manage their own quarantines/caches.

Some of you testing Maia 1.0.0 may have also noticed that there are some new "Reporting" options on the System Configuration page. While these are not yet active, these features will eventually be used to take Maia's reporting capabilities to the next level--the Maia Network. At a very basic level, the Maia Network will allow Maia sites to report statistics to a Report Server, so that a real-time picture of the spam/malware situation across that network can be displayed. At a higher level, this network will enable Maia sites to report distilled SpamAssassin rule-triggering data for both confirmed spam and confirmed ham (without transmitting any actual mail), so that the Report Server can accumulate the necessary data to generate daily re-scorings of the SpamAssassin rulesets. Maia sites can then download these daily score updates to install and use immediately, rather than having to wait weeks or months for the next official SpamAssassin release. Sites that contribute reports gain access to the daily score updates, and of course those new scores will be balanced based on the data that all of the participating sites have contributed.

I'm also interested in adding features that can report spam directly to ISPs, as well as to the FTC's new spam-reporting address (uce@ftc.gov). These features would try to extract the sender's likely ISP from the "Received:" headers of the mail, and send a report to the postmaster and abuse addresses there, with the offending mail as an attachment (to keep its original headers intact). The point of this reporting is not just to try to get the spammer evicted, but to register an official complaint with the ISP, so that a spammer with a so-called "pink contract" will be charged more money by the ISP in exchange for having to handle the complaint. Spammers count on the fact that the number of people who actually complain about spam is about the same as the number of people who buy from spam--1/100 to 1/10 of a percent of the recipients. This is generally because most users don't know how to figure out where to send the complaints, and they receive too much spam every day to be bothered taking the time to do the complaining for each item. If Maia automated this process, sites that currently receive thousands of pieces of spam a day would suddenly be issuing thousands of complaints a day, putting significant pressure on the ISPs and/or giving them the ammunition to use to charge spammers a lot more money to put up with the complaint volume.

Another future direction for Maia may include more "offensive" measures, such as the auto-spidering of URLs that are advertised in spam e-mails. Paul Graham (whose seminal ideas about Bayesian filtering led last year to the development of tools like SpamAssassin) suggested this idea in an article called "Spam Filters that Fight Back". The thought behind it is that spammers anticipate a very small response rate to their spam, because most people don't actually visit their websites, and many don't even open the mail in the first place. But what if a lot of people did? The bandwidth hit on the spammer's web server would certainly be high, probably incurring higher costs for him, and all this unexpected extra traffic would likely also drown out some of connections from people who would otherwise be trying to buy the spammers' products. In effect, it's a kind of distributed denial-of-service attack, except that the spammer has actually invited the recipient to visit his site, so the spammer is really just getting what he's asked for.

Maia may eventually also play a role in an evolving anti-spam network that's based on peer-to-peer technology on a larger scale. Evolving technologies in this area, such as GNU-net, offer a lot of promise for distributing resources like DNSBL data in such a way that they will not be vulnerable to the kinds of distributed denial-of-service attacks that put Osirusoft and monkeys.com out of business last year. Maia's new reporting mechanism (i.e. the Maia Network) offers the ability to provide a lot of real-time data for such a collaborative network, in addition to its own private network. Eventually other kinds of data might be reportable from Maia sites--probable sender IP addresses for confirmed spam items, IP addresses of hosts advertised in spam URLs, and so on. This data can be used to collaboratively generate "scored" DNSBLs that return a time-weighted confidence score based on the number of spam reports related to an IP address over time. Rather than simply testing to see whether a site is "listed" or not, sites using these lists would receive the confidence score, which could be incorporated as a SpamAssassin scoring rule, much the way Razor handles things.

In short, Maia is far more than simply a web front-end for amavis (I should hope so, for 20,000+ lines of Perl and PHP code!) Maia is a spam-fighting technology system that happens to use amavis and SpamAssassin as key components. Sites that run Maia Mailguard will not only be defending their own networks against spam and malware, but will also be doing their part to fight spam in a more active way. As the Maia Network comes online and Maia sites begin reporting from around the world, we'll have the collective data we need in order to help make all of our filters more effective.

That's the "big picture" of the Maia system as I envision it. Understanding this and the spam-fighting philosophy behind it is important, I think, if you mean to get the most out of Maia Mailguard. I'm asking you to check a lot of your outdated notions about spam-filtering at the door, to embrace a new way of doing things, with new tools and more ambitious goals. Maia is constantly evolving--are you?

Maia Mailguard

A Spam and Virus Management System

Version 1.0.2a

Maia: The Big Picture