SpamAssassin/Integrating SpamAssassin with sendmail

From WikiContent

< SpamAssassin(Difference between revisions)
Jump to: navigation, search
(Initial conversion from Docbook)
Current revision (10:56, 7 March 2008) (edit) (undo)
(Initial conversion from Docbook)
 
Line 824: Line 824:
After editing ''mailertable'', be sure to use <tt>makemap</tt> to build the ''mailertable.db''database from the ''mailertable'' file.
After editing ''mailertable'', be sure to use <tt>makemap</tt> to build the ''mailertable.db''database from the ''mailertable'' file.
</dl>
</dl>
 +
=== Internal Server Configuration ===
=== Internal Server Configuration ===

Current revision

SpamAssassin

sendmail has long been the most widely used mail transport agent in the world. It was routing mail before the Internet existed as such and continues to form the backbone of many of the largest mail servers on the Net today. This chapter explains how to integrate SpamAssassin into a sendmail-based mail server to perform spam-checking for local recipients or to create a spam-checking mail gateway.

Warning

sendmail is a complex piece of software and can have several security implications for systems on which it runs. You should always run the most up-to-date version of sendmail and keep track of new bug reports and security advisories. This chapter assumes that you are running the latest release of sendmail—Version 8.12—and does not cover how to securely install, configure, or operate sendmail itself. For that information, see the sendmail documentation and the book sendmail by Bryan Costales and Eric Allman (O'Reilly).

Contents

Spam-Checking at Delivery

The easiest way to add SpamAssassin to a sendmail system is to configure sendmail to use procmail as its local delivery agent, and to add a procmail recipe for spam-tagging to /etc/procmailrc. The advantages of this approach are

  • It's very easy to set up.
  • You can run spamd, and the procmail recipe can use spamc for faster spam-checking.
  • User preference files, autowhitelists, and Bayesian databases can be used.

There are also some disadvantages:

  • sendmail must complete the SMTP transaction and accept an email message for local delivery before spam-checking takes place. Accordingly, you can't save bandwidth or mailbox space by rejecting spam during the SMTP transaction.
  • sendmail only runs the local delivery agent for email destined for a local recipient. You cannot create a spam-checking gateway with this approach.

To configure sendmail to use procmail as its local delivery agent, add the following line to your sendmail.mc file (before the MAILER(`local') line) and regenerate sendmail.cf from it:

FEATURE(`local_procmail',`/path/to/procmail')dnl

When you restart sendmail, it will use procmail instead of the system's default local MDA (e.g., /bin/mail) for mail delivery.

Next, configure procmail to invoke SpamAssassin. If you want to invoke SpamAssassin on behalf of every user, do so by editing the /etc/procmailrc file. Example 5-1 shows an /etc/procmailrcthat invokes SpamAssassin.

Example 5-1. A complete /etc/procmailrc

DROPPRIVS=yes
PATH=/bin:/usr/bin:/usr/local/bin
SHELL=/bin/sh

# Spamassassin
:0fw
* <300 000
|/usr/bin/spamassassin

If you run spamd, replace the call to spamassassin in procmailrc with a call to spamc instead. Using spamc/spamd will significantly improve performance on most systems, but makes it more difficult to allow users to write their own rules.

Spam-Checking During SMTP

If you want to refuse spam before it reaches your recipients, or set up a spam-checking gateway to an internal email server, you need a way to perform spam-checking during the SMTP transaction. If a message is found to be spam, you may want to refuse it and end the SMTP session, or accept it and add headers that users can use in their mail client filters. sendmail provides a general-purpose filtering interface, called milter, for use during the SMTP transaction.

The Milter Interface

In sendmail's parlance, milter refers to several things. Milter is an application programming interface (API) for writing filters for sendmail, and a protocol for communication between sendmail and a filter. A milter is also a filter program written using this API that listens for connections from a sendmail process and defines functions to call at different points of the SMTP transaction to accept, reject, discard, temporarily refuse, or modify a message. The milter library, libmilter, provides most of the code required to set up a milter and manage the work of calling your filtering functions during an SMTP transaction.

A milter can provide functions that sendmail will call at the following points in an SMTP transaction:

  • When a mail client connects to sendmail
  • After the SMTP HELO or EHLO commands
  • After the SMTP MAIL FROM command
  • After the SMTP RCPT TO command
  • After each message header is transmitted during the DATA step
  • After all message headers are transmitted
  • After each piece of the message body is transmitted
  • At the end of the DATA step, after the entire message has been transmitted
  • When the SMTP transaction is aborted
  • When the client connection is closed

Milter functions can perform the following operations on a message:

  • Add, change, or delete a header
  • Add or remove a recipient
  • Replace the message body
  • Reject a connection, message, or recipient
  • Temporarily fail a connection, message, or recipient
  • Accept and discard a message
  • Accept a message

Milters operate as daemons. They are typically started before sendmail during system startup and listen for connections from a sendmail process on a TCP or Unix domain socket. Milters do not have to be run as root. For more information about writing milters, visit http://www.milter.org.

You configure sendmail to use a milter by adding an INPUT_MAIL_FILTER( ) macro to the sendmail.mc configuration file and generating a new sendmail.cf file. Example 5-2 shows parts of a sendmail.mc file that includes a milter.

Example 5-2. A sendmail.mc file with a milter

divert(0)dnl
VERSIONID(`example mc')dnl
OSTYPE(linux)dnl
DOMAIN(generic)dnl
...
INPUT_MAIL_FILTER(`mymilter', `S=unix:/var/run/mymilter.sock, F=T, T=S:60s;R:60s;E:
5m')dnl
...
MAILER(smtp)dnl
MAILER(local)dnl
MAILER(procmail)dnl

The INPUT_MAIL_FILTER macro takes two arguments. The first provides the name of the milter (mymilter in Example 5-2), and the second tells sendmail how to interact with the milter. The second argument in turn consists of several instructions, separated by commas:

S= socket description
This argument describes how sendmail should connect to the milter. The socket description consists of a protocol (unix for a Unix domain socket, inet for a TCP/IP socket, inet6 for a TCP/IPv6 socket), a colon, and a protocol-specific address. For Unix domain sockets, the address is the path to the socket file. For TCP sockets, the address is in the form port@host.
F= failure mode
This argument determines how sendmail will behave if it fails to connect to the milter process. Use F=T to cause sendmail to temporarily refuse email when it can't contact the milter. Use F=R to cause sendmail to reject connections when it can't contact the milter. Omit an F= argument to cause sendmail to accept messages without filtering when it can't contact the milter.
T= timeout list
This argument determines how long sendmail should wait for the milter to respond before treating the connection attempt as a failure. It consists of a set of states and the amount of time to allow for each, separated by semicolons. In Example 5-1, sendmail uses a 60-second timeout for sending data to the milter (S:60s), a 60-second timeout for reading replies from the milter (R:60s), and a 5-minute timeout for waiting for the milter's final acknowledgment after sending the message (E:5m). There is also a C timeout for connecting to the milter. If you leave any timeouts unspecified, sendmail uses its default timeouts: 10 seconds for sending and reading, and 5 minutes for connecting and final acknowledgment.

The INPUT_MAIL_FILTER macro results in the following lines being added to the sendmail.cf file when you generate it:

O InputMailFilters=mymilter
...
Xmymilter, S=unix:/var/run/mymilter.sock, F=T, T=S:60s;R:60s;E:5m

SpamAssassin itself is not a milter. However, several milters have been written that invoke SpamAssassin on messages and then take action during the SMTP transaction.

MIMEDefang

MIMEDefang is one of the most popular sendmail milters. It provides a general framework for performing milter functions in Perl and comes with a default configuration that performs several functions:

  • Messages can be checked with a virus scanner, and messages carrying viruses can be refused, discarded, or quarantined.
  • MIME attachments can be examined, and messages can be refused, discarded, or quarantined if they contain attached files with given filename extensions (e.g., extensions that denote executable Windows files).
  • The HTML attachment in a message of type multipart/alternative (containing both text and HTML versions of the same message) can be dropped.
  • SpamAssassin can be invoked on the message, and spam can be refused, discarded, quarantined, or tagged.

MIMEDefang is developed by Roaring Penguin Software and is available as free software at http://www.mimedefang.org. Roaring Penguin also produces commercial products, CanIt and CanIt-PRO, which are based on MIMEDefang and SpamAssassin and add several other features including web-based interfaces for administrators and users.

The rest of this section details the installation, operation, and customization of MIMEDefang 2.42 as an example of a full-scale, milter-based approach to using SpamAssassin. MIMEDefang's other functions, such as virus-checking, are mentioned but not covered in detail; read the MIMEDefang documentation for more information.

Tip

Use the latest available version of MIMEDefang. In particular, only versions 2.42 and later support SpamAssassin 3.0.

Installing MIMEDefang

MIMEDefang is written in Perl and invokes SpamAssassin through the Mail::SpamAssassin Perl modules. Because MIMEDefang itself is a daemon, you do not need to run spamd. It's easiest to install SpamAssassin (and your antivirus software) first and then install MIMEDefang.

A good way to begin a MIMEDefang installation is to verify that you have the prerequisite Perl modules on hand. MIMEDefang requires sendmail 8.12 (or later). MIMEDefang also requires several Perl modules, including: MIME::Tools, IO::Stringy, MIME::Base64, MailTools, Digest::SHA1, and HTML::Parser. Most of them can be installed using CPAN.

Warning

MIMEDefang will not work correctly with the standard version of MIME::Tools 5.411a. Either install MIME::Tools 6 or later, or install the special version of MIME::Tools 5.411a available from Roaring Penguin's web site.

You should create a new user account and group for running MIMEDefang; the usual name for both the user and group is defang. This user will own MIMEDefang's files, and the user (or group) must have access to SpamAssassin's configuration and database files as well.

MIMEDefang uses two important directories. It uses /var/spool/MIMEDefang as a working directory for unpacking email messages and scanning them. For optimal performance, place this directory on a fast disk—even a RAM disk if your operating system supports it and you have enough memory to spare. MIMEDefang stores quarantined email messages in /var/spool/MD-Quarantine. Speed is not so critical with this directory, and it should never be located on a RAM disk because you will want to be sure that you can access quarantined files. Create these directories before you install MIMEDefang. The directories should be owned by user and group defang and should not be world-readable or world-searchable.

Next, download the MIMEDefang source code from http://www.roaringpenguin.com, unpack it, run the configure script, make, and perform a make install as root. Example 5-3 shows this process from the point of running the configure script:

Example 5-3. Compiling MIMEDefang

$ ./configure
creating cache ./config.cache
...
creating config.h

*** Virus scanner detection results:
H+BEDV   'antivir'   NO (not found)
Vexira   'vexira'    NO (not found)
NAI      'uvscan'    NO (not found)
BDC      'bdc'       NO (not found)
Sophos   'sweep'     NO (not found)
TREND    'vscan'     NO (not found)
CLAMSCAN 'clamav'    YES - /usr/bin/clamscan
AVP      'AvpLinux'  NO (not found)
FSAV     'fsav'      NO (not found)
FPROT    'f-prot'    NO (not found)
SOPHIE   'sophie'    NO (not found)
NVCC     'nvcc'      NO (not found)
CLAMD    'clamd'     YES - /usr/sbin/clamd
File::Scan           NO (not found)
TROPHIE  'trophie'   NO (not found)

Found Mail::SpamAssassin.  You may use spam_assassin_* functions
Did not find Anomy::HTMLCleaner.  Do not use anomy_clean_html( )
Found HTML::Parser.  You may use append_html_boilerplate( )

Note: SpamAssassin, File::Scan, HTML::Parser and Anomy::HTMLCleaner are
detected at run-time, so if you install or remove any of those modules, you
do not need to re-run ./configure and make a new mimedefang.pl.
$ make
gcc -g -O2 -Wall -Wstrict-prototypes -pthread -D_POSIX_PTHREAD_SEMANTICS -DPERL_PATH=\"
/usr/local/bin/perl\" -DMIMEDEFANG_PL=\"/usr/local/bin/mimedefang.pl\" -DRM=\"/bin/rm\" -
DVERSION=\"2.42\" -DSPOOLDIR=\"/var/spool/MIMEDefang\" -DQDIR=\"/var/spool/MD-Quarantine\
" -DCONFDIR=\"/etc/mail\" -I../sendmail-8.12.11/include -c -o mimedefang.o mimedefang.c
...
$ su
Password: 
                        XXXXXX
                     
# make install
mkdir -p /etc/mail && chmod 755 /etc/mail
...
Please create the spool directory, '/var/spool/MIMEDefang',
if it does not exist.  Give it mode 700, and make
it owned by the user you intend to run MIMEDefang as.
Please do the same with the quarantine directory, '/var/spool/MD-Quarantine'.
#

The following programs and files are installed:

mimedefang
The milter itself. This program receives requests from sendmail to filter messages and pass them on to mimedefang-multiplexor to perform the checks. It then communicates the results back to sendmail.
mimedefang-multiplexor
A program to receive requests from mimedefang and farm them out to a pool of mimedefang.pl Perl processes for scanning. It is responsible for maintaining the process pool, creating and destroying processes as necessary. This approach minimizes the time and CPU overhead required in starting new processes for each scan.
mimedefang.pl
A Perl script to perform all of the message-checking functions of MIMEDefang. During the several stages of checking a message, this script calls functions defined in /etc/mail/mimedefang-filter.
md-mx-ctrl
A command-line tool for viewing the status of the multiplexor or for ordering it to reload its slave processes.
watch-mimedefang
A graphical interface based on Tcl/Tk.
/etc/mail/spamassassin/sa-mimedefang.cf
A sitewide configuration file used by MIMEDefang. By default, MIMEDefang's install process generates a simple file, with few options.
/etc/mail/mimedefang-filter
A file containing Perl subroutines called by mimedefang.pl at different stages of message-processing. These subroutines check messages or message parts, and direct MIMEDefang to accept, quarantine, discard, or bounce a message. MIMEDefang installs a default mimedefang-filter that invokes SpamAssassin to add an X-Spam-Score header and a SpamAssassin report to all messages. To implement more complex spam-checking behavior, you'll edit mimedefang-filter. This file is discussed in greater detail in Section 5.3.3, later in this chapter.

Starting the MIMEDefang multiplexor

To run MIMEDefang, you must start two processes: the multiplexor (mimedefang-multiplexor) and the milter (mimedefang). You should start the multiplexor first because the milter process will connect to it. Start each process as root; each changes its uid to the defang user after startup.

mimedefang-multiplexor has over a dozen command-line options, but you will typically need to use only a few of them. The most common are described here; for complete information, see the manpage.

-U user
Instructs mimedefang-multiplexor to run as the given user (e.g., defang). Running as a non-root user is an important security measure.
-s /path/to/socket
Specifies the path to the Unix domain socket that the multiplexor will use to listen for requests from the milter process. It defaults to /var/spool/MIMEDefang/mimedefang-multiplexor.sock.
-p filename
Causes the multiplexor to write its process ID to the specified file. You can use this ID to signal the multiplexor to reread the filter when you change it or to stop the multiplexor (these operations are discussed later in this section).
-m number-of-slaves
Specifies the minimum number of slave, mimedefang.pl processes that should be running at any given time. This value defaults to 0, but on most systems, you want to have at least two slave processes running at all times to minimize startup overhead.
-x number-of-slaves
Specifies the maximum number of slave, mimedefang.pl processes that should be running at any given time. This value defaults to 2, but busy mail servers will require more than two processes to be available at any given time. You should plan to increase this value to 5, 10, or higher, depending on your needs.
-q number-of-requests
Causes the multiplexor to queue an incoming request when a multiplexor is not immediately available to service that request. By default, the multiplexor causes sendmail to temporarily fail a message when all slave processes are busy (returning a 4xx SMTP status code to the sending MTA, which should retain the message in its queue and try to deliver it again later).
-D
Causes the multiplexor to run in the foreground, for debugging purposes. Without this option, the multiplexor detaches from the terminal and runs in the background.

A typical invocation of mimedefang-multiplexor might be:

/usr/local/bin/mimedefang-multiplexor -U defang -p /var/run/mimedefang-multiplexor.
pid -m 2 -x 10

Checking multiplexor status

Once the multiplexor is running, use the md-mx-ctrl command to examine its status. md-mx-ctrl status provides a human-readable status report on the multiplexor's slave processes; md-mx-ctrl msgs shows the total number of messages processed by the multiplexor. If you're using a nondefault socket for the multiplexor, you can specify that socket to md-mx-ctrl using the -s /path/to/socket command-line option. Example 5-4 shows these md-mx-ctrl invocations and their output. On the system in the example, the multiplexor has been configured with a minimum of two slaves (both of which are idle) and a maximum of ten, and has processed 17,366 messages.

Example 5-4. Invoking md-mx-ctrl

# md-mx-ctrl status
Max slaves: 10
Slave 0: stopped
Slave 1: stopped
Slave 2: idle
Slave 3: stopped
Slave 4: stopped
Slave 5: stopped
Slave 6: idle
Slave 7: stopped
Slave 8: stopped
Slave 9: stopped
# md-mx-ctrl msgs
17366

Starting the MIMEDefang milter

mimedefang performs a simpler task than the multiplexor. Its job is to receive filtering requests from sendmail and pass them on to the multiplexor to handle. Accordingly, it has fewer command-line options. Here are the most commonly used options.

-p /path/to/socket
Specifies the path to the Unix domain socket that the milter process will listen on for requests from sendmail. This path must match the path you specify in sendmail's INPUT_MAIL_FILTER( ) macro. A typical choice is /var/spool/MIMEDefang/mimedefang.sock, which is a required option.
-m /path/to/multiplexor/socket
Specifies the Unix domain socket on which the multiplexor is listening for requests. mimedefang sends requests to the multiplexor on this socket. This option is required, and the value should match that of the multiplexor's -s option (typically /var/spool/MIMEDefang/mimedefang-multiplexor.sock).
-U user
Instructs mimedefang to run as the given user (e.g., defang). You must provide the same user to mimedefang-multiplexor and mimedefang.
-P filename
Directs mimedefang to write its process ID to the specified file. Note that this option uses a capital P.

A typical invocation of mimedefang might be:

/usr/local/bin/mimedefang -U defang -P /var/run/mimedefang.pid \
-p /var/spool/MIMEDefang/mimedefang.sock \
-m /var/spool/MIMEDefang/mimedefang-multiplexor.sock

A sample boot script for automatically starting and stopping MIMEDefang can be found in the examples directory of MIMEDefang's source code. Editing this script and installing it with your other system boot scripts is an easy way to properly configure MIMEDefang, as it lists all of the multiplexor and milter process options as shell variables. Ideally, the script should run before sendmail's startup script so that the milter socket is in place before sendmail starts. Likewise, you should stop sendmail before you stop MIMEDefang's process.

Verifying the MIMEDefang processes

You can use the ps command to verify that all your MIMEDefang processes are running. Example 5-5 shows the process listing and the contents of /var/spool/MIMEDefang and /var/spool/MD-Quarantineon a typical system running sendmail and MIMEDefang. MIMEDefang's processes include one mimedefang-multiplexor process, three slave mimedefang.pl processes started by the multiplexor for scanning messages, and four mimedefang milter processes started by sendmail. All processes are running as user defang. The /var/spool/MIMEDefang directory contains working directories used temporarily by MIMEDefang (names starting with "mdefang"), as well as Unix domain sockets and pid files. The /var/spool/MD-Quarantine directory includes subdirectories holding quarantined messages.

Example 5-5. Processes and layout of a typical MIMEDefang system

# ps auxw | egrep 'mime'
defang   27145  0.0  0.0  1312  688 ?        S    Jan15   0:42 /usr/local/bin/mimedefang-
multiplexor -p /var/spool/MIMEDefang/mimedefang-multiplexor.pid -m 2 -x 10 -U defang -b 
300 -l -T -s /var/spool/MIMEDefang/mimedefang-multiplexor.sock
defang   27162  0.0  0.1  2552  856 ?        S    Jan15   0:00 /usr/local/bin/mimedefang 
-P /var/spool/MIMEDefang/mimedefang.pid -U defang -m /var/spool/MIMEDefang/mimedefang-
multiplexor.sock -p /var/spool/MIMEDefang/mimedefang.sock
defang   20548  1.0  2.8 23464 22416 ?       S    12:05   1:43 perl -w /usr/local/bin/
mimedefang.pl -server
defang   25089  0.0  0.1  2552  856 ?        S    13:57   0:00 /usr/local/bin/mimedefang 
-P /var/spool/MIMEDefang/mimedefang.pid -U defang -m /var/spool/MIMEDefang/mimedefang-
multiplexor.sock -p /var/spool/MIMEDefang/mimedefang.sock
defang   25142  0.0  0.1  2552  856 ?        S    13:59   0:00 /usr/local/bin/mimedefang 
-P /var/spool/MIMEDefang/mimedefang.pid -U defang -m /var/spool/MIMEDefang/mimedefang-
multiplexor.sock -p /var/spool/MIMEDefang/mimedefang.sock
defang   25589  0.0  0.1  2552  856 ?        S    14:11   0:00 /usr/local/bin/mimedefang 
-P /var/spool/MIMEDefang/mimedefang.pid -U defang -m /var/spool/MIMEDefang/mimedefang-
multiplexor.sock -p /var/spool/MIMEDefang/mimedefang.sock
defang   26616  0.3  2.6 21588 20572 ?       S    14:35   0:04 perl -w /usr/local/bin/
mimedefang.pl -server
defang   26617  0.2  2.6 21492 20492 ?       S    14:35   0:03 perl -w /usr/local/bin/
mimedefang.pl -server

# ls -l /var/spool/MIMEDefang
drwx------   3 defang   defang        149 Jan 28 14:47 mdefang-i0SKkoMD027104
drwx------   3 defang   defang        149 Jan 28 14:48 mdefang-i0SKlwMB027198
-rw-------   1 defang   defang          6 Jan 15 10:40 mimedefang-multiplexor.pid
srw-------   1 defang   defang          0 Jan 15 10:40 mimedefang-multiplexor.sock
-rw-------   1 defang   defang          6 Jan 15 10:40 mimedefang.pid
srwx------   1 defang   defang          0 Jan 15 10:40 mimedefang.sock

# ls -l /var/spool/MD-Quarantine
drwx------   2 defang   defang        212 Dec 27 10:37 qdir-2004-01-28-10.37.35-001
drwx------   2 defang   defang        212 Dec 27 16:25 qdir-2004-01-28-16.25.03-001

Customizing MIMEDefang

Use the mimedefang-filter file to configure the actions that MIMEDefang takes when filtering messages. The file is written in Perl. MIMEDefang distributes and installs a working sample file, typically in /etc/mail, but you will need to modify several settings in the file for your local environment. Example 5-6 shows the configuration settings near the beginning of this file. You should always change $AdminAddress, $AdminName, and $DaemonAddress. Generally, $AddWarningsInline and md_graphdefang_log_enable( ) can be left unchanged, and $MaxMIMEParts should be uncommented to prevent denial-of-service attacks.

Example 5-6. Configuration section of mimedefang-filter

#***********************************************************************
# Set administrator's e-mail address here.  The administrator receives
# quarantine messages and is listed as the contact for site-wide
# MIMEDefang policy.  A good example would be 'defang-admin@mydomain.com'
#***********************************************************************
$AdminAddress = 'postmaster@localhost';
$AdminName = "MIMEDefang Administrator's Full Name";

#***********************************************************************
# Set the e-mail address from which MIMEDefang quarantine warnings and
# user notifications appear to come.  A good example would be
# 'mimedefang@mydomain.com'.  Make sure to have an alias for this
# address if you want replies to it to work.
#***********************************************************************
$DaemonAddress = 'mimedefang@localhost';

#***********************************************************************
# If you set $AddWarningsInline to 1, then MIMEDefang tries *very* hard
# to add warnings directly in the message body (text or html) rather
# than adding a separate "WARNING.TXT" MIME part.  If the message
# has no text or html part, then a separate MIME part is still used.
#***********************************************************************
$AddWarningsInline = 0;

#***********************************************************************
# To enable syslogging of virus and spam activity, add the following
# to the filter:
# md_graphdefang_log_enable( );
# You may optionally provide a syslogging facility by passing an
# argument such as:  md_graphdefang_log_enable('local4');  If you do this, be
# sure to setup the new syslog facility (probably in /etc/syslog.conf).
# An optional second argument causes a line of output to be produced
# for each recipient (if it is 1), or only a single summary line
# for all recipients (if it is 0.)  The default is 1.
# Comment this line out to disable logging.
#***********************************************************************
md_graphdefang_log_enable('mail', 1);

#***********************************************************************
# Uncomment this to block messages with more than 50 parts.  This will
# *NOT* work unless you're using Roaring Penguin's patched version
# of MIME tools, version MIME-tools-5.411a-RP-Patched-02 or later.
#
# WARNING: DO NOT SET THIS VARIABLE unless you're using at least
# MIME-tools-5.411a-RP-Patched-02; otherwise, your filter will fail.
#***********************************************************************
# $MaxMIMEParts = 50;

The remainder of the mimedefang-filter file is a set of Perl functions that mimedefang.pl will call when checking a message. You can modify these functions to customize MIMEDefang's behavior. The functions include:

filter_begin( )
Called with no arguments at the start of filtering. Suitable for setting variables that you expect to use throughout the filter, or for performing whole-message checks like virus-scanning immediately.
filter_multipart( entity,name,extension,type )
Called for each MIME part of the message that contains other MIME parts within it. The entity is a MIME::Entity object, name is the suggested filename of the part, extension is the file extension, and type is the MIME type. Suitable for validating MIME parts or refusing specific multipart types (e.g., message/partial).
filter( entity,name,extension,type )
Called for each MIME part of the message that does not contain other MIME parts within it. Arguments are the same as for filter_multipart( ). Suitable for validating filenames, virus-scanning individual MIME parts, or refusing specific MIME types.
filter_end( entity )
Called at the end of filtering with the MIME::Entity object representing the entire message to be returned to sendmail. Suitable for checking variables that you set elsewhere in the filter and performing computationally expensive whole-message checks like spam-tagging if necessary.

These functions can make decisions about the disposition or modification of individual message parts by calling one of the MIMEDefang action functions. In most cases, actions should be taken only by the filter( ) or filter_multipart( ) functions. The most commonly used action functions are:

action_accept( ), action_accept_with_warning( string)
Accept the current message part, possibly adding a warning to the message.
action_drop( ), action_drop_with_warning( string )
Drop the current message part, possibly adding a warning to the message.
action_replace_with_warning( string )
Replace the current message part with a warning message.
action_quarantine( entity,string )
Drop and quarantine the current message part, and add a warning to the message.
action_quarantine_entire_message( string )
Quarantine the entire message, and add a warning to the administrator notification if one is generated. This action only quarantines; it does not also discard or bounce the message. You must call action_discard( ) or action_bounce( ) afterward.
action_bounce( string[,SMTP reply code[,DSN code]] )
Instruct sendmail to reject the message with string returned to the sender as the reason for rejection. You can optionally specify an SMTP reply code (which defaults to 554) and a DSN code (which defaults to 5.7.1). Bouncing a message does not stop MIMEDefang from continuing to process other message parts; the bounce occurs after all parts have been processed.
action_tempfail( string[,SMTP reply code[,DSN code]] )
Instruct sendmail to temporarily reject the message with string returned to the sender as the reason for rejection. You can optionally specify an SMTP reply code (which defaults to 450) and a DSN code (which defaults to 4.7.1).
action_discard( )
Discard the entire message silently once all parts have been processed.
action_notify_sender( string )
Generate an email notification back to the message sender containing the given string, which may consist of multiple lines.
action_notify_administrator( string )
Generate an email notification back to the MIMEDefang administrator containing the given string, which may consist of multiple lines.
action_add_part( entity,type,encoding,data,fname,disposition[,offset] )
Add a new MIME part to the message represented by entity. The new part will have a MIME content-type of type and content-encoding of encoding. The new part itself should be stored in data and its associated filename in fname. The MIME content-disposition is given by disposition. The optional offset specifies where to add the part; it defaults to -1 (add at end). This action may be performed in filter_end( ).
action_add_header( header,value )
Add a new header to the message. The header's name is given in header, without a trailing colon, and the value to set the header to is given in value. It is possible to add multiple headers with the same name.
action_change_header( header,value[,index] )
Change a header in the message. The header's name is given in header, without a trailing colon, and the new value to set the header to is given in value. If index is given, changes the index'th header with that name. Changing a header that does not exist will add a new header.
action_delete_header( header[,index] )
Delete a header in the message. The header's name is given in header, without a trailing colon. If index is given, deletes the index'th header with that name instead of the first one.
action_delete_all_headers( header )
Deletes all headers in the message with a given name. The header's name is given in header, without a trailing colon.

Tip

If you call one of the notification functions (e.g., action_notify_sender), MIMEDefang creates a notification message and sends it by invoking sendmail in its deferred mode; sendmail will enqueue the notification message in its client mail queue rather than sending it immediately. You must run a sendmail process that periodically sends messages in the client queue. One way to do so is to issue the following command at system boot (via a boot script):

/usr/sbin/sendmail -Ac -q5m

See the sendmail documentation for more information about deferred mode and client queue runners.

By calling these functions, you can configure MIMEDefang to suit nearly any email management policy you wish to institute.

When you make changes to the mimedefang-filter 'script, you must signal mimedefang-multiplexor to reread the configuration and restart its slave processes. The easiest way to signal the multiplexor is to use the md-mx-ctrl reread command. Another way is to use the kill -INT process-id command to send a SIGINT signal to the multiplexor process; you can identify the process ID from ps output or by examining the pid file if the multiplexor was started with the -p option.

SpamAssassin Integration

MIMEDefang expects to find a SpamAssassin configuration file called sa-mimedefang.cf in your sitewide configuration directory (usually /etc/mail/spamassassin). If it doesn't, it will also look for local.cf in the same directory. This gives you the flexibility of creating different SpamAssassin configurations to be used when SpamAssassin is invoked by MIMEDefang and when SpamAssassin is invoked by local users or scripts.

Tip

If you're going to be invoking SpamAssassin only through MIMEDefang, or if there should be no differences in the configuration file based on how MIMEDefang is invoked, consider making a hard or symbolic link from local.cf to sa-mimedefang.cf. MIMEDefang will find the configuration file it first looks for, and you will avoid the possibility of later creating two different configurations.

When running SpamAssassin via MIMEDefang, you may not use any of SpamAssassin's configuration directives that modify a mail message. Attempting to modify the Subject header or add new headers using SpamAssassin directives will not work. All such changes must be performed by MIMEDefang in the mimedefang-filter script.

If you want SpamAssassin to perform network-based tests (such as DNSBL lookups), you must add a line to mimedefang-filter (just after the $AdminName setting works well) to set the $SALocalTestsOnly variable to 0, like this:

$SALocalTestsOnly = 0;

The section of the default mimedefang-filter that handles spam-tagging appears in the filter_end( ) function and is agreeably easy to read. It is presented in Example 5-7.

Example 5-7. Spam-tagging section of mimedefang-filter

    # Spam checks if SpamAssassin is installed
    if ($Features{"SpamAssassin"}) {
        if (-s "./INPUTMSG" < 300*1024) {
            # Only scan messages smaller than 300kB.  Larger messages
            # are extremely unlikely to be spam, and SpamAssassin is
            # dreadfully slow on very large messages.
            my($hits, $req, $names, $report) = spam_assassin_check( );
            my($score);
            if ($hits < 40) {
                $score = "*" x int($hits);
            } else {
                $score = "*" x 40;
            }
            # We add a header which looks like this:
            # X-Spam-Score: 6.8 (******) NAME_OF_TEST,NAME_OF_TEST
            # The number of asterisks in parens is the integer part
            # of the spam score clamped to a maximum of 40.
            # MUA filters can easily be written to trigger on a
            # minimum number of asterisks...
            action_change_header("X-Spam-Score", "$hits ($score) $names");
            if ($hits >= $req) {
                md_graphdefang_log('spam', $hits, $RelayAddr);

                # If you find the SA report useful, add it, I guess...
                action_add_part($entity, "text/plain", "-suggest",
                                "$report\n",
                                "SpamAssassinReport.txt", "inline");
            } else {
                # Delete any existing X-Spam-Score header?
                action_delete_header("X-Spam-Score");
            }
        }
    }

First, the code checks to be sure that MIMEDefang detected SpamAssassin on the system when it started. It then checks to be sure that the INPUTMSG file, which contains the message to scan, is smaller than 300 kilobytes. If that's the case, the code calls MIMEDefang's spam_assassin_check( ) function, which uses Mail::SpamAssassin to check the message and returns the number of hits, number of required hits for tagging, names of tests hit, and the text of SpamAssassin's spam report for the message. The code creates a $score variable containing one asterisk for each hit (up to 40).

Next, the code in Example 5-7 calls the MIMEDefang action_change_header( ) function to change (or add) the X-Spam-Score header. The header will include the number of hits (expressed numerically and as a line of asterisks) and the names of tests that matched.

If the number of hits is greater than or equal to the required number to declare the message spam, the code calls MIMEDefang's md_graphdefang_log( ) function to make a log entry and then adds the SpamAssassin report text to the message as an additional MIME part using the action_add_part( ) function. If the number of hits is less than the required number for tagging, the script removes the X-Spam-Score header.

You might customize this code in filter_end( ) in several easy ways to suit your needs. By commenting out the action_delete_header( ) line, you can have the X-Spam-Score header added to all messages, spam or not. If you want to modify the Subject header of spam messages as SpamAssassin does, add the following code before the action_add_part( ) line:

action_change_header("Subject", "*****SPAM***** $Subject");

The $Subject variable will already contain the message subject.

Tip

Remember that you must signal the MIMEDefang milter to reread mimedefang-filter whenever you change it or any Perl modules on which it depends—including SpamAssassin and its configuration. If you update SpamAssassin or modify settings in /etc/mail/spamassassin/sa-mimedefang.cf, you should signal the milter.

Adding sitewide Bayesian filtering

Adding a sitewide Bayesian filter for use with MIMEDefang is relatively easy. Use the usual SpamAssassin use_bayes and bayes_path directives in sa-mimedefang.cf, and ensure that the defang user has permission to create the databases in the directory named in bayes_path. One way to do this is to create a directory for the databases that is owned by defang, such as /var/spool/MD-Bayes. Another option is to locate the databases in a directory owned by another user but to create them ahead of time and chown them to defang.If local users need access to the databases (e.g., they will be running sa-learn), you may have to make the databases readable or writable by a group other than defang and adjust the bayes_file_mode, or make them world-readable or world-writable. Doing so, however, puts the integrity of your spam-checking at the mercy of the good intentions and comprehension of your users.

Adding sitewide autowhitelisting

In SpamAssassin 3.0, autowhitelisting is easy to enable. You need only add the usual autowhitelist directives to sa-mimedefang.cf to determine where and how the autowhitelist database will be stored. Be sure to enable the use_auto_whitelist configuration option to turn on autowhitelisting.

Using a sitewide autowhitelist database in SpamAssassin 2.63 requires just a bit more effort. In addition to adding the SpamAssassin autowhitelist directives to sa-mimedefang.cf, you must modify mimedefang.pl to provide SpamAssassin with an address list factory, as discussed in Chapter 4. Example 5-8 shows the spam_assassin_init( ) function in mimedefang.pl. Add the emphasized lines to support autowhitelisting. Don't forget to signal mimedefang-multiplexor to reread its configuration after making these changes.

Example 5-8. Adding an address list factory to mimedefang.pl

sub spam_assassin_init (;$) {

    unless ($Features{"SpamAssassin"}) {
        md_syslog('err', "$MsgID: Attempt to call SpamAssassin function, but SpamAssassin 
is not installed.");
        return undef;
    }

    if (!defined($SASpamTester)) {
        my $config = shift;
        unless ($config)
        {
            if (-r "/etc/mail/spamassassin/sa-mimedefang.cf") {
                $config = "/etc/mail/spamassassin/sa-mimedefang.cf";
            } elsif (-r "/etc/mail/spamassassin/local.cf") {
                $config = "/etc/mail/spamassassin/local.cf";
            } else {
                $config = "/etc/mail/spamassassin.cf";
            }
        }

        $SASpamTester = Mail::SpamAssassin->new({
            local_tests_only   => $SALocalTestsOnly,
            dont_copy_prefs    => 1,
            userprefs_filename => $config});

        require Mail::SpamAssassin::DBBasedAddrList;
        my $awl = Mail::SpamAssassin::DBBasedAddrList->new( );
        $SASpamTester->set_persistent_address_list_factory ($awl);
    }

    return $SASpamTester;
}

Adding per-domain or per-user streaming

By default, MIMEDefang processes each message once and applies SpamAssassin's spam determination to the message. This process works well if you run a small mail server for a single domain, but it presents a problem for mail gateways, virtual hosts, and larger servers. What should be done when an email message is received for multiple recipients—possibly at multiple domains? MIMEDefang provides two functions that you can use to implement solutions to this problem, stream_by_recipient( ) and stream_by_domain( ). Each works in the same way.

If you add a call to stream_by_recipient( ) to the filter_begin( ) function, stream_by_recipient( ) checks to see if a message has only a single recipient. If so, it returns 0, and the filter should continue to work on the message. If the message has multiple recipients, stream_by_recipient( ) reinjects the message by connecting to sendmail and resubmitting the message as a series of new messages, one for each recipient of the original message. Figure 5-1 illustrates this process. In this case, stream_by_recipient( ) returns 1, and the original, multirecipient message should be discarded. When the new single-recipient messages arrive at the filter, they will pass through stream_by_recipient( ) and continue on to the rest of the filter, which can now safely perform per-recipient functions (such as using personal whitelists and blacklists or other user preferences).

Figure 5-1. Streaming by recipients

Streaming by recipients

stream_by_domain( ) works similarly but only reinjects one new copy of a message for each recipient domain in the original message. The rest of the filter can behave differently for different recipient domains, which permits virtual hosting providers to apply different spam criteria for different domains they host.

Warning

Although some MIMEDefang features will work with sendmail 8.11, stream_by_domain( ) and stream_by_recipient( ) require sendmail 8.12. Moreover, locally submitted messages must be sent via SMTP for these functions to work (sendmail must be running as user smmsp rather than as user root).

Example 5-9 shows how you could use stream_by_domain( ) to offer different policies to different recipient domains. Policies are stored in a Berkeley database file /etc/mail/spampolicy.db that is generated from a text file /etc/mail/spampolicy using the standard sendmail makemap program. Each line of the text file should contain a domain name, white space, and a policy, which should be either TAG (tag spam at SpamAssassin's default level), TAG n (tag messages with over n hits), BLOCK (reject spam at SpamAssassin's default level), BLOCK n (reject messages with over n hits), or IGNORE (do no spam-checking). spampolicy.db must be owned by defang.

Example 5-9. Using stream_by_domain( )

                     use DB_File;

sub getpolicy {
  # Where do we find the policy db?
  my $policydb = '/etc/mail/spampolicy.db';
  # If a domain isn't listed, what's the default policy?
  my $default_policy = 'TAG';
  my $host = shift;
  tie %policy, 'DB_File', $policydb, O_RDONLY, 0640, $DB_HASH;
  my $policy = $policy{"\L$host"};
  untie %policy;
  return defined($policy) ? "\U$policy" : $default_policy;
}

sub filter_begin ( ) {
    if ($SuspiciousCharsInHeaders) {
        md_graphdefang_log('suspicious_chars');
        return action_discard( );
    }

    # Per-domain streaming is turned on here so we get the $Domain var
    # set later on.
    return if stream_by_domain( );
    ...
}

sub filter_end ($) {
    my($entity) = @_;
    send_quarantine_notifications( );

    # No sense doing any extra work
    return if message_rejected( );

    # Spam checks if SpamAssassin is installed
    if ($Features{"SpamAssassin"}) {
        if (-s "./INPUTMSG" < 100*1024) {
          # Spam policy selection, based on $Domain, using a Berkeley db lookup
          my $spampolicy = getpolicy($Domain);
          action_add_header("X-Spam-Policy", "$spampolicy $Domain");
          if ($spampolicy ne "IGNORE") {
            my($hits, $req, $names, $report) = spam_assassin_check( );
            $req = $1 if ($spampolicy =~ /(\d+)/);
            if ($hits >= $req) {
                md_graphdefang_log('spam', $hits, $RelayAddr);
                if ($spampolicy =~ /BLOCK/) {
                  action_bounce("Message rejected by SpamAssassin");
                  return;
                }
                my($score);
                if ($hits < 40) {
                    $score = "*" x int($hits);
                } else {
                    $score = "*" x 40;
                }
                action_change_header("X-Spam-Score", "$hits ($score) $names");
                action_add_part($entity, "text/plain", "-suggest",
                                "$report\n",
                                "SpamAssassinReport.txt", "inline");
            } else {
                action_delete_header("X-Spam-Score");
            }
          }
        }
    }
}

You could similarly use stream_by_recipient( ) in an environment where you want to read SpamAssassin user preferences for each recipient from an SQL database. The Mail::SpamAssassin object used in mimedefang-filter is named $SASpamTester. A simple approach is to call the load_scoreonly_sql( ) method on that object, passing the recipient's email address as an argument, like this:

# @Recipients in mimedefang-filter is an array of recipient emails,
# but if you're using stream_by_recipient, there should only be a single
# recipient at this point.
my $recip = $Recipient[0];
# If your SQL database uses usernames rather than email addresses, uncomment:
# $recip =~ s/@.*//;
$SASpamTester->load_scoreonly_sql($recip);

This approach creates a new database connection for each mail message. A more complicated, but more efficient approach would be to set up a database connection in filter_begin( ) and write SQL queries by hand in filter_end( ). On the other hand, using SpamAssassin's own functions, like load_scoreonly_sql( ), ensures that your code will be compatible with future SpamAssassin releases that might change the database format.

Although stream_by_recipient( ) and stream_by_domain( ) solve an important problem, they do so at a cost in performance. Messages that arrive for multiple recipients (or domains) will have to be split up and reinjected, considerably increasing the overall load on the mail server.

Building a Spam-Checking Gateway

By combining sendmail, MIMEDefang, and SpamAssassin, you can build a complete spam-checking gateway. Such systems are increasingly popular as external mail exchangers, receiving messages from the Internet and relaying them to internal mail servers that don't perform their own spam-checking (either for performance reasons or because they run operating systems that don't provide cost-effective antispam solutions). I assume that users relay outgoing mail through an internal mail server, rather than through the spam-checking gateway. Figure 5-2 illustrates this topology.

Figure 5-2. Spam-checking gateway topology

Spam-checking gateway topology

The example gateway in this section is based on actual gateways in operation on the Internet. Although I provide complete configuration files for the example, I discuss only those aspects of configuration directly relevant to spam-checking.

sendmail Configuration

In our scenario, the spam-checking gateway should accept messages for our domains, check them for spam, and relay them to an internal mail server. Accordingly, I include the following in our sendmail configuration:

  • The mailertable feature, which we may use to indicate the internal server to which we'll relay checked messages.
  • A one-hour timeout before sending a warning message about delayed delivery, and a seven-day timeout before bouncing messages. If the internal server should fail and need to be replaced, senders will quickly know that their messages have been delayed, but messages won't be bounced unless you can't replace the internal server within a week.
  • Several configuration options to limit sendmail's resource usage. We limit sendmail to 60 forked child processes and 10 connections per second. We limit messages to 10Mb and 500 recipients each.
  • An INPUT_MAIL_FILTER definition for MIMEDefang.

Example 5-10 is the sendmail.mc configuration file for the gateway and is used to generate /etc/mail/sendmail.cf.

Example 5-10. sendmail.mc file for a spam-checking gateway

divert(-1)
#
# Spam-checking gateway configuration
#

divert(0)dnl
VERSIONID(`Spam-checking gateway')
OSTYPE(linux)dnl
DOMAIN(generic)dnl
FEATURE(virtusertable)dnl
FEATURE(mailertable)dnl
FEATURE(access_db)dnl
FEATURE(always_add_domain)dnl
FEATURE(nouucp,`reject')dnl
FEATURE(`relay_based_on_MX')dnl
define(`confDEF_USER_ID',``8:12'')dnl
define(`confPRIVACY_FLAGS',`goaway,noreceipts,restrictmailq,restrictqrun,noetrn'
dnl Since this is for a gateway MX, we keep the queue around for a long
dnl time without bouncing messages, but we warn about delivery delay
dnl rather quickly
define(`confTO_QUEUERETURN',`7d')dnl
define(`confTO_QUEUEWARN_NORMAL',`1h')dnl
dnl Options to prevent denial-of-service
define(`confMAX_DAEMON_CHILDREN',`60')dnl
define(`ConfMAX_MESSAGE_SIZE',`10000000')dnl
define(`confMAX_CONNECTION_RATE_THROTTLE',`10')dnl
define(`confMAX_RCPTS_PER_MESSAGE',`500')dnl
INPUT_MAIL_FILTER(`mimedefang', `S=unix:/var/spool/MIMEDefang/mimedefang.sock, T
MAILER(smtp)dnl
MAILER(local)dnl
MAILER(procmail)dnl

Because mail destined for the example.com domain should not be delivered locally on the external gateway, do not include example.com as one of the gateway's local hostnames in the /etc/mail/local-host-names (/etc/mail/sendmail.cw on some systems) file.

SpamAssassin Configuration

Store the SpamAssassin configuration for a gateway in /etc/mail/sa-mimedefang.cf. In addition to setting the typical options, it's a wise idea to use the trusted_networks (and, in SpamAssassin 3.0, internal_networks) directive to define the boundary between trusted and untrusted networks. Example 5-11 shows the sa-mimedefang.cf file on a system configured to use a sitewide Bayesian database.

Example 5-11. sa-mimedefang.cf file for a spam-checking gateway

required_hits 5

# These are hosts that we control
internal_networks 192.168.10/24

# This is a backup MX that's offsite
trusted_networks 111.222.333.444

bayes_path /var/spool/MD-Bayes/bayes

MIMEDefang Configuration

After installing MIMEDefang, set up three directories:

/var/spool/MIMEDefang
To contain MIMEDefang's working directories, and to hold the socket and pid files. Mount this directory on a RAM disk for increased performance.
/var/spool/MD-Quarantine
To contain quarantine directories.
/var/spool/MD-Bayes
To hold the Bayesian database files.

Each of these directories should be owned by the user under which MIMEDefang runs (typically, defang).

Edit mimedefang-filter to configure it for your gateway. Example 5-12 shows the first portion of a mimedefang-filter script corresponding to the example gateway I'm describing in this chapter. Each of the key variables in the file is defined.

Example 5-12. mimedefang-filter configuration for a spam-checking gateway

#***********************************************************************
# Set administrator's e-mail address here.  The administrator receives
# quarantine messages and is listed as the contact for site-wide
# MIMEDefang policy.  A good example would be 'defang-admin@mydomain.com'
#***********************************************************************
$AdminAddress = 'postmaster@example.com';
$AdminName = "Example.com Postmaster";

#***********************************************************************
# Set the e-mail address from which MIMEDefang quarantine warnings and
# user notifications appear to come.  A good example would be
# 'mimedefang@mydomain.com'.  Make sure to have an alias for this
# address if you want replies to it to work.
#***********************************************************************
$DaemonAddress = 'mimedefang@example.com';

# Allow SpamAssassin to use network-based tests
$SALocalTestsOnly = 0;
               

Routing Email

Mail from the Internet for example.com should be sent to the spam-checking gateway mail.example.com. To accomplish that, add a DNS mail exchanger (MX) record for the example.com domain that points to mail.example.com.

Once received by mail.example.com, messages will be spam-checked and should then be relayed to internal.example.com. You can accomplish that relaying in one of two ways:

Using DNS
Provide mail.example.com with an MX record for example.com pointing to internal.example.com and having a lower preference value (more preferred) than the mail.example.com MX record. This requires that you provide different results to DNS queries from Internet hosts versus queries from mail.example.com. Do so by running so-called split DNS, or by using BIND 9's view directives. Internet hosts should see only the mail.example.com MX record, but mail.example.com (and probably all internal hosts and clients) should see the internal.example.com MX record.

Using mailertable
Add FEATURE(`mailertable') to the sendmail.mc file, and create a /etc/mail/mailertable file that instructs sendmail where to forward messages destined for example.com:

example.com    esmtp:internal.example.com

or:

example.com    esmtp:[192.168.10.55]

After editing mailertable, be sure to use makemap to build the mailertable.dbdatabase from the mailertable file.


Internal Server Configuration

Once the external mail gateway is in place, you can configure the internal mail server to accept only SMTP connections from the gateway (for incoming Internet mail). If you don't have a separate server for outgoing mail, the internal mail server should also accept SMTP connections from hosts on the internal network. This restriction is usually enforced by limiting access to TCP port 25 using a host-based firewall or a packet-filtering router.

Testing

You should now have a complete, spam-checking gateway. Test the gateway by sending spam and non-spam messages to user@example.com. Messages should arrive at internal.example.com with Received headers that show that they were first received by mail.example.com and then by internal.example.com, and X-Scanned-By headers that mention MIMEDefang. Spam messages should have X-Spam-Status headers added as well.

Personal tools