wiki:AutoDelete

Auto-Deleting High-Scoring Spam

While Maia provides an efficient quarantining mechanism for spam, most of that spam typically scores so highly that it can be safely discarded, and rather than forcing users to page through all of that spam to confirm it as such it can be more convenient in many cases to simply remove items whose scores exceed some conservative threshold.

Maia 1.1 and later incorporate this sort of feature in the handling of score thresholds, such that Low-Probability Spam and High-Probability Spam can be treated differently. In Maia 1.0.x however, all spam is treated the same way, so it is not possible to quarantine low-scoring spam and discard high-scoring spam. A workaround for the 1.0.x series was proposed by Sebastian A. Aresca however, and is detailed below, along with a refinement.

Sebastian Aresca's Method

First, create a new index for maia_mail.score:

CREATE INDEX maia_mail_idx_score ON maia_mail (score);

Next, create a new database view to select the high-scoring mail you are interested in auto-deleting. In this example a conservative score threshold of 20 is used:

CREATE VIEW HighScore AS SELECT maia_mail.id AS id FROM maia_mail WHERE (maia_mail.score > 20);

Now copy this shell script to a file (e.g. maia-autodelete.sh) and edit the conservative score threshold to match whatever you used above:

#!/bin/bash

echo -n "Deleting high-scoring spam ... "

mysql -v -D maia -u root -B -e "DELETE FROM maia_mail_recipients WHERE mail_id NOT IN (SELECT id FROM maia_mail);"
mysql -v -D maia -u root -B -e "DELETE FROM maia_mail_recipients WHERE mail_id IN (SELECT * FROM HighScore)"
mysql -v -D maia -u root -B -e "DELETE FROM maia_sa_rules_triggered WHERE mail_id IN (SELECT * FROM HighScore)"
mysql -v -D maia -u root -B -e "DELETE FROM maia_viruses_detected WHERE mail_id IN (SELECT * FROM HighScore)"
mysql -v -D maia -u root -B -e "DELETE FROM maia_banned_attachments_found WHERE mail_id IN (SELECT * FROM HighScore)"
mysql -v -D maia -u root -B -e "DELETE FROM maia_mail WHERE score > 20"

echo "OK"

Finally, test the script by hand to make sure it's working, then add a crontab entry to have the script run at regular intervals:

# Maia: Auto-delete high-scoring spam
0,30 * * * *  /var/amavisd/maia/scripts/maia-autodelete.sh

Sebastian's original postings on the subject can be found here and here.

Robert LeBlanc's Method

This refinement of Sebastian's method takes advantage of features in MySQL 4.x and later to encapsulate the deletions in a transaction so that data integrity is maintained even while new mail items are being received.

Start by creating the index on maia_mail.score as before:

CREATE INDEX maia_mail_idx_score ON maia_mail (score);

Next, create the view but use a temporary table to store it so that we can safely delete from the actual maia_mail table (since it's referenced in the view itself):

CREATE ALGORITHM = TEMPTABLE VIEW maia.HighScore AS
       SELECT maia_mail.id AS id
              FROM maia_mail
              WHERE maia_mail.score >= 20;

Now copy the following SQL code to a file (e.g. maia-autodelete.sql):

START TRANSACTION WITH CONSISTENT SNAPSHOT;
   DELETE FROM maia_mail_recipients
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_sa_rules_triggered
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_viruses_detected
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_banned_attachments_found
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_mail
          WHERE id IN (SELECT * FROM maia.HighScore);
COMMIT;

To execute the script, just feed it to MySQL:

mysql -u root -p maia < maia-autodelete.sql

Robert's original post can be found here.

For Users of MySQL 4.1

Both methods outlined above require MySQL 5.x (the VIEW statement is not available in earlier versions), but it is still possible to work around this with MySQL 4.1 by using a temporary table to simulate a VIEW. Effectively you create a table, stock it with the list of mail IDs that need to be purged, then use that list to do the deletions, and finally drop the table when you're done. Your maia-autodelete.sql script would then look something like this:

CREATE TABLE maia.HighScore
       SELECT maia_mail.id AS id
              FROM maia_mail
              WHERE maia_mail.score >= 20;

START TRANSACTION WITH CONSISTENT SNAPSHOT;
   DELETE FROM maia_mail_recipients
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_sa_rules_triggered
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_viruses_detected
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_banned_attachments_found
          WHERE mail_id IN (SELECT * FROM maia.HighScore);
   DELETE FROM maia_mail
          WHERE id IN (SELECT * FROM maia.HighScore);
COMMIT;


DROP TABLE maia.HighScore;

Notes and Warnings

Note that this sort of process is not without its share of side-effects, since it bypasses Maia's normal deletion mechanisms. In particularly it will mean that your statistics will become inaccurate, since these deleted items will not be recorded as such--it will be as if they were never received at all. A more thorough script would take care of that bit of bookkeeping as well.

Another potential concern is that this sort of process could confuse users if they login to the web interface and see that there are 300 items in their spam quarantine and then find only 70 items in there when they click on the quarantine link, if the deletion script happens to run in the meantime. This will particularly affect digest subscribers, who may receive an email digest that lists items that may very well be deleted by the time they finally login to view them. I would suggest that you also modify the SELECT query in the send-quarantine-digests.pl script to only list items with scores lower than your deletion threshold, since higher-scoring items will ultimately be deleted when the deletion script runs.


Back to FAQ

Last modified 15 years ago Last modified on May 29, 2008, 12:01:46 PM