Linux in a Windows World/Additional Server Programs/Configuring Mail Servers

From WikiContent

Jump to: navigation, search
Linux in a Windows World

Email is a particularly important part of most networks' functioning. Many businesses rely on email for both internal and external communications. Naturally, then, Linux can function as an email server—a computer that receives, stores, and forwards email for end users. As part of this function, a Linux mail server can filter out spam and worms from email, thus reducing both nuisance factors and security threats. Linux's advantages as a mail server over Windows include the low costs of the server and of add-on filters for spam and worms, as well as Linux's reliability and immunity to the Windows-based worms that are such a problem today. Even if you already run a Microsoft Exchange email server, Linux can be an excellent supplement to this server, providing filtering features that might require paying extra to obtain in Windows.

This chapter begins with a look at common email protocols and some common server software to implement them on Linux. Most of the chapter is devoted to basic configuration of the sendmail and Postfix servers, as well as to additional servers that can be used to deliver mail to clients. Filtering mail for spam, worms, and viruses requires its own coverage, as does a tool that can help users integrate mail delivered to outside ISPs into their own local mail systems.

Contents

Linux Mail Server Options

Before delving into the process of configuring mail servers, you should understand the role of mail servers on a network. The most basic issue is the distinction between push and pull protocols, which differ in whether the sender or the recipient initiates the transfer. Depending on your needs, you might want to configure a push server or a server that runs both push and pull protocols. You should also know what options are available for both push and pull server programs. On a Windows-dominated network, you may already have a Microsoft Exchange server, so knowing how to fit a Linux server into this existing configuration is important. Finally, running a mail server is not risk-free; they can be abused in various ways, and understanding a bit about the threats will enable you to plan your installation to minimize the risks.

Push Mail Versus Pull Mail Protocols

The most common email protocol today is SMTP, which is an example of a push mail protocol—the sender initiates the data transfer. Typically, a user runs a mail client (also known as a mail user agent, or MUA) to send the mail to the SMTP server (which is also referred to as a mail transfer agent, or MTA). The SMTP server then delivers the message to other servers, which then send it on until it reaches its destination. This chain can run for an arbitrary length.

Traditionally, users have had login accounts on the mail server computer and have used mail readers on the computer itself. This configuration, though, requires either mail delivery using SMTP to users' desktop computers or login accounts for all users on a central mail server. Both options are a bit awkward, so a second class of mail server protocols exists: pull mail protocols. These protocols enable the client to retrieve (pull) the mail from the server. If an SMTP server that's the ultimate destination for a message runs a pull mail server, a user on a desktop computer can run a mail reader that supports the pull mail protocol to read mail directly from the desktop computer, as illustrated in Figure 13-1. Two pull mail protocols are common today: POP and IMAP. (The differences are described in more detail shortly.)

Figure 13-1. A pull mail server enables users to read mail using mail clients on their local computers

A pull mail server enables users to read mail using mail clients on their local computers

Of these two pull mail protocols POP is the simpler one. It provides a single storage area for each user's messages; users then download the messages and immediately delete them from the server. (Mail readers typically delete the messages automatically.) Users can then create local folders on their desktop computers using the email clients and store their messages that way. IMAP, on the other hand, supports mail folders on the mail server computer, as well as more sophisticated options for retrieving parts of messages. IMAP, isn't quite as well supported as POP, but its ability to store messages in folders on the server helps when users frequently use multiple computers.

As a practical matter, most networks now use POP, IMAP, or a similar protocol (such as Microsoft's MAPI) for the final leg of email delivery. If you want to use Linux as a pull mail server, you certainly can; several POP and IMAP servers are available for Linux and are described in a later section. These servers can work with Linux, Windows, and other clients. If you're not sure whether to use POP or IMAP, you can install and use servers for both protocols; however, each user should probably use just one protocol. Mixing them can cause confusion; for example, messages disappear from an IMAP inbox after a POP client has been used.

Linux SMTP Server Options

Quite a few SMTP servers are available for Linux; however, four servers are the most popular and readily available. These servers differ in their design philosophies, mail storage formats (mbox or maildir, described in more detail shortly), ease of configuration, popularity, and other features:

Exim
This server, headquartered at http://www.exim.org, uses a monolithic design: one program does most of the work. It supports both mbox and maildir storage formats, with mbox being the default. This is the default server on Debian GNU/Linux and some of its derivatives.
Postfix
This server is a modular mail server, meaning that various subtasks of mail delivery are handled by separate programs. In theory, this makes it easier to write a server that's free of security-related bugs. Postfix supports mbox and maildir formats, with mbox being the default. Some distributions, including Mandrake and SuSE, now use Postfix as the default mail server. You can learn more at http://www.postfix.org.
qmail
This server uses an unusual license that's not quite open source: binary redistribution is prohibited unless certain conditions are met. Thus, qmail isn't the default mail server for any major Linux distribution. This server supports both mbox and maildir email storage formats, with maildir being the default. Like Postfix, qmail uses a modular design. Overall, it's the least compatible with sendmail, which makes it harder to replace sendmail with qmail than to replace sendmail with Postfix or Exim; but qmail has a devoted band of followers.
Sendmail
The most popular mail server for years has been sendmail (http://www.sendmail.org), which uses a monolithic design and supports the mbox mail storage format. In the 1990s, sendmail acquired a bad reputation for security problems, but such problems have become much rarer since the late 1990s. The main sendmail configuration file format is confusing at best, so most administrators use a metalanguage, known as m4, to create configuration files, but even the m4 configuration files aren't as easy to handle as the files for most other Linux mail servers.

Throughout the 1990s, sendmail ran on a majority of the mail servers in existence, according to most studies of the issue. More recently, though, sendmail has declined in popularity, while others (Exim, Postfix, qmail, and others, including Windows mail servers) have risen in popularity. Despite this decline, sendmail remains a very popular (perhaps still the single most popular) mail server program on large mail server computers. For this reason, sendmail configuration is described later in this chapter. Because it's becoming popular as the default server on many Linux distributions, this chapter also describes Postfix configuration. Although Exim and qmail are both perfectly good mail servers, they aren't described in this chapter, in order to keep the chapter's size manageable. If your Linux system is already running one of these servers, you can either try to find equivalent options to those described here or you can replace your current server with Postfix or sendmail. Most Linux distributions ship with at least two or three SMTP servers, or at least make them available in an online file repository. You can also check the mail servers' web sites for links to versions for your distribution.

The preceding descriptions referred to the local mail storage formats supported by each server. The mbox format uses a single file, to which email messages are appended. Each user has a mailbox, typically somewhere in the /var directory tree, to which the server adds messages as they arrive. The maildir format, on the other hand, stores messages as individual files in a directory. Users' incoming messages may be stored in subdirectories of users' home directories. Each format has its adherents, but your primary consideration should be compatibility. Local mail clients and pull mail servers must be able to read messages in the appropriate format. Some programs are limited in their capabilities, which can dictate your choice of options for the SMTP server, or even completely rule out an SMTP server. If you're building a mail system from scratch, you might want to assemble a list of software you want to use, based on features and recommendations from others, then pick the file format based on what your collection of software supports. If you wish to replace an existing SMTP server program, the simplest approach is to pick one that supports whatever format you're currently using.

Linux POP and IMAP Server Options

Just as with SMTP, several POP and IMAP servers are available for Linux. Some packages support only one protocol, but many support both. Many of the servers are limited to just one mailbox format, though, and some of the IMAP servers use their own format for folders other than the inbox. On the whole, you may need to hunt a bit to find the server that best suits your needs.

Courier
This server, located at http://www.courier-mta.org, is a complete mail server package, including support for SMTP, POP, IMAP, and other protocols. Although the full Courier package isn't one of the "big four" SMTP servers, the POP/IMAP component (available separately from http://www.courier-mta.org/imap/) is moderately popular. It provides access to mail stored in maildir format but not mbox.
Cyrus IMAP
Although IMAP is part of this popular server's name, it supports both POP and IMAP. Cyrus IMAP stores POP mail in mbox format but uses its own format for IMAP folders. This server provides more options than some and emphasizes encrypted authentication protocols using its own password database. You can learn more at http://asg.web.cmu.edu/cyrus/imapd/.
Dovecot
This server, headquartered at http://dovecot.org, is a fairly recent entry to the POP and IMAP server field, but it's rapidly becoming a popular server. It supports both protocols, as well as both mbox and maildir file formats. The Dovecot documentation indicates that it was written with security as a primary focus. This server provides more options than many POP and IMAP servers, so it's worth investigating if you have unusual needs.
nupop
This server was designed for environments hosting a large number of users; it aims to operate as efficiently as possible. It supports POP but not IMAP and maildir but not mbox. Check http://nupop.nuvox.net for more information.
popa3d
Security, reliability, standards compliance, and performance are the primary goals of popa3d, which is a POP server headquartered at http://www.openwall.com/popa3d/. It's designed to support mbox mail files, but a patch that provides maildir support is available on its web site.
qmail-pop3d
This program is part of the qmail package (http://www.qmail.org). As such, it's most often used with qmail and employs the maildir format that's the default for qmail. This is a POP-only server.
QPopper
Despite the Q in their name, this server is unrelated to qmail. Versions prior to 4.0 were commercial servers, but as of Version 4.0, the server is open source. It's a POP-only server that works with the mbox mail format. Check http://www.eudora.com/qpopper/ for more information.
UW IMAP
The University of Washington IMAP server (http://www.washington.edu/imap/) has long been the default POP and IMAP server in Linux. This server, which uses the mbox format, is easy to get running but provides few options to fine-tune its operation.

Most of these servers use normal Linux authentication mechanisms, such as Linux's PAM (described in more detail in Appendix A), although some provide options for using or must use some other authentication mechanism. Broadly speaking, UW IMAP is usually the simplest server to configure, and it's usually adequate for small sites, particularly if you want to use POP rather than IMAP. (IMAP use with UW IMAP may be complicated if users also have shell access, because the server hardcodes the location of IMAP folders in the user's home directory, which can be awkward. This isn't a problem if users don't have shell access to the server, though.) Dovecot has recently been gaining in popularity and is worth investigating if you find UW IMAP too limiting. Any of the other servers may also be good choices for particular uses—for example, if you only want to use POP or if you want to provide access to maildir-format mailboxes. This chapter describes running UW IMAP later, in Section 13.4.

Mail Security Concerns

Mail servers, like all servers, are potential security risks. In fact, mail servers—and particularly SMTP servers—can be more vulnerable than you might at first think, because they must perform some operations as root. For instance, when storing mail, the server needs to be able to write to arbitrary users' mail queues, which are owned by their respective users. This means that mail servers must run as root, and the complexity of modern SMTP servers means that bugs in the code can give clever intruders access. This is why modular mail servers are, theoretically and all other things being equal, potentially safer than monolithic servers. By isolating tasks that must be run as root to separate programs, other mail server programs can run as non-root users, reducing the risk that a bug will lead to a system compromise.

That said, recent versions of the monolithic Exim and sendmail servers don't have bad security reputations. I can't promise that you won't encounter problems if you use one of them, but the risk isn't unmanageable for most sites.

In recent years, another email security concern has come to dominate the news: worms and viruses. (Most of these are technically worms by most definitions, but the term virus is frequently applied to them all.) An email worm is a piece of code sent via email that, when run, replicates and sends copies of itself to others, usually via email. Typically, such worms are sent as attachments that appear innocuous. They also might rummage through victims' address books to locate new addresses, so recipients may trust the worms because they know the apparent senders.

Worms have become a serious threat; outbreaks have become fairly frequent, and the sheer number of worms being sent requires extra storage capacity, faster CPUs, and better network connections on mail servers than would otherwise be required. When a new worm is released and spreads rapidly, the demands placed on all these resources spike, often beyond the capacity of the hardware to cope with the problem.

Another security issue with email is that SMTP servers send their mail without encryption. This doesn't pose a direct security threat to the mail server computer, but it does mean that email can be intercepted and read if any system between the source and the destination is compromised. For this reason, sensitive data such as passwords and credit-card numbers shouldn't be sent via email. One approach to fixing this problem is to equip mail clients with encryption tools such as the GNU Privacy Guard (GPG; http://www.gnupg.org). Two GPG-equipped systems can send encrypted messages to each other, although the SMTP protocol itself remains unencrypted.

Configuring Sendmail

Sendmail has long been the most common SMTP server. Although its popularity has dropped somewhat in recent years, it remains the dominant mail server on the Internet at large and is the standard mail server installed in many Linux distributions, including Fedora, Red Hat, and Slackware.

To configure sendmail, you must first know where to find its configuration files, understand their formats, and know how to create and modify these files. These tasks are trickier in sendmail than in most other mail servers, which is one of sendmail's big drawbacks compared to other popular Linux mail servers, especially for new mail administrators. This chapter looks at three particularly important areas of sendmail configuration: address options, relay options, and antispam options.

Tip

Sendmail is an extremely complex server, so this chapter can only begin to scratch its surface. If you need to do more with sendmail than is described here, you should consult its own documentation or a book on sendmail, such as O'Reilly's sendmail or sendmail Cookbook.

Sendmail Configuration Files

The main sendmail configuration file is called sendmail.cf , and it's usually located in /etc, /etc/mail, or some other subdirectory of /etc. Unfortunately, this file is very difficult to edit directly because the configuration options are numerous and have formats that are fairly obtuse. For this reason, few people even attempt to edit this file directly. Instead, they use the m4 utility to create a sendmail.cf file from a file with a simpler format.

In order to use the m4 utility, though, it must be installed on your system. What's more, the utility relies on a series of support files, which may be installed from yet another package. In Fedora and Red Hat, for instance, you must install the sendmail-cf package. Look for the m4 package on your distribution medium, and also look for any likely sendmail m4 configuration packages. (They're likely to include sendmail in the package names.)

The m4 tool converts a file with a name that typically ends in .mc into sendmail's sendmail.cf. Unfortunately, the precise name used varies from one distribution to another. For instance, in Fedora and Red Hat, it's /etc/mail/sendmail.mc, whereas in Slackware it's /usr/share/sendmail/cf/cf/sendmail-slackware.mc. To perform the conversion, you use the m4 command, piping the .mc file into this command and redirecting output to the desired file:

# m4 < /etc/mail/sendmail.mc > /etc/mail/sendmail.cf
               

Warning

This command overwrites the existing /etc/mail/sendmail.cf file. For safety, you should back up this file by copying it to another location before running this command.

Once you've rebuilt the configuration file, you must restart sendmail. In most cases, this can by done by passing a restart or reload argument to the sendmail SysV startup script:

# /etc/rc.d/init.d/sendmail restart
               

Alternatively, you can use kill to send a SIGHUP signal to the sendmail process. This procedure can be less disruptive than completely restarting sendmail, and so it may be preferable.

Before you do this, however, you must make changes to your sendmail .mc file. Compared to the .cf file, the .mc file is simple and comprehensible. Most options are set in parentheses using a define or FEATURE keyword:

define(`SMART_HOST',`smtp.pangaea.edu')
FEATURE(always_add_domain)

Additional option names exist, but these two account for many of the sendmail features. The parameters passed to these options are sometimes enclosed in single quotes, but unlike most configuration files, the opening and closing quote characters are different: The opening quote is actually a backtick (`), located to the left of the 1 key on most keyboards. The closing quote is an ordinary single quote character ('), located to the right of the semicolon (;) key on most keyboards.

The sendmail .mc file uses the string dnl to denote a comment. Many sample configurations include quite a few options that are commented out by placing this string at the start of the line. Sometimes a hash mark (#) also appears on the line, but this character isn't an actual comment character; it's just there for the benefit of users who are accustomed to seeing a hash mark used as a comment marker.

In addition to the main sendmail .cf and .mc files, other files serve to hold ancillary data:

access.db
This binary file is created by the makemap utility from a plaintext file that often has the same name with a different or no filename extension. This file controls which computers may interact with the sendmail server and in what ways. This information is particularly critical for sendmail relay configurations, as described shortly.
aliases.db
This file is a binary file created by makemap or newaliases. (Passing the -bi option to sendmail also does the job.) This file defines aliases—that is, mappings of email addresses onto other email addresses. For instance, most distributions set up an alias of postmaster to root, so that root receives mail addressed to postmaster.

These files usually appear in /etc/mail or sometimes in /etc. If you examine the .mc configuration file, you'll probably find references to these files. Chances are you shouldn't modify these references, although you may want to adjust the files' contents, particularly if you need to adjust your relay configurations.

Sendmail Address Options

In a basic sendmail configuration, the most important settings relate to ports and addressing. Some distributions ship sendmail configured to bind only to the localhost (127.0.0.1) address. The result is that the server can be accessed only from the local computer. This can be a good configuration if you're running a desktop system that shouldn't accept outside SMTP connections, but for a mail server, you probably don't want this restriction. Check the sendmail .mc file for a line like this:

DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')dnl

If this line is present, and you want the server to accept outside connections, add dnl to the start of the line to comment out this option. If you don't see a line like this, you don't need to make any changes.

Another address-related option is to set the server's hostname. Frequently, the server has a specific hostname, such as smtp.pangaea.edu, but you want mail from your users to use your domain name only, such as linnaeus@pangaea.edu, rather than linnaeus@smtp.pangaea.edu. Frequently, mail clients can set this address; however, if you find that some of your outgoing mail sets an incorrect domain or includes a hostname in the address, you can have sendmail change this by including the following lines in the .mc file:

MASQUERADE_AS(`pangaea.edu')
FEATURE(masquerade_envelope)

Of course, you'd change pangaea.edu in the first line to your own domain name. The first line tells sendmail what should appear to the right of the at sign (@) in email addresses if users' mail clients don't specify an address. The FEATURE(masquerade_envelope) line takes this a step further, by masquerading the address provided in email headers, which are normally invisible to users. If you don't use these options, sendmail assumes that its hostname is as set on the computer (as determined by the gethostbyname( ) system call), but sendmail won't adjust the address in outgoing email.

Tip

An important part of the email addressing scheme is setting the mail exchanger(MX) entry in your domain's DNS record. This record tells sending mail servers the name of your domain's mail server computer, so that mail addressed to linnaeus@pangaea.edu is sent to smtp.pangaea.edu. Chapter 15 describes DNS configuration, including setting the MX record.

Sendmail Relay Options

An important part of any SMTP server configuration is setting mail relay options. A mail server can function as a relay (that is, accept mail that's destined for another location) or use a relay (that is, send outgoing mail by way of a server other than the ultimate destination). Setting these options so that sendmail does what you need it to do without doing too much can be tricky sometimes.

Warning

If a mail server accepts relays from systems or users who shouldn't be able to use it for this purpose, the server is known as an open relay . Such mail servers are easily abused by spammers, so open relay configurations should be avoided at all costs.

Configuring sendmail to relay mail

Sendmail is frequently employed as a mail relay server for a network. That is, you configure mail clients to send all outgoing mail via the Linux sendmail server. Out of the box, though, recent versions of sendmail refuse such relay attempts as an antispam precaution. You can loosen this configuration using any of several options, specified within a FEATURE specification:

relay_entire_domain
This option tells sendmail to perform a DNS lookup on a sending computer's IP address and to accept relay attempts if the resulting hostname is within your domain. This is a quick and easy way to enable relaying, but it can be abused; spammers can modify their own networks' DNS servers to provide a reverse lookup in your domain, thus tricking your system into accepting undesirable relays.
relay_local_from
If you use this option, sendmail accepts any mail for relay so long as the From: address in the message is in sendmail's local domain. This address is very easily forged, though, and so is a poor option in most cases.
relay_based_on_MX
This option is another DNS-based rule. It tells sendmail to accept mail for relaying if the mail is destined for a domain that lists the sendmail server in its MX record.
relay_hosts_only
With this option, sendmail looks up the sending system in a database (described shortly); if the specific computer that's attempting to relay mail is listed in the database, the mail is accepted.
access_db
This option is similar to relay_hosts_only, but it employs a more flexible interpretation of data in the database, enabling you to list entire domains. Many default sendmail configurations use this option by default, albeit with an empty initial database.

Warning

Another relay option is promiscuous_relay , but this option should never be used. It tells sendmail to accept all relay attempts. This configuration is effectively an invitation to spammers to abuse your system.

As an example, suppose you want to use the access_db method. You might then include a line like the following in your sendmail .mc file:

FEATURE(`access_db')

Some configurations add more options within the parentheses—say, to specify the method of encoding data and the access database filename (normally /etc/mail/access.db). The access_db and relay_hosts_only options are the safest ways to configure mail relays, and they both use the same access.db configuration file. This file is a binary database file that's built from a text-mode file, typically called access. This text-mode file consists of lines that take the following format:

                     host.specification  CODE
                  

In addition to these lines, the file may contain additional modifiers, as well as comments that begin with hash marks (#). The host.specification takes the form of IP addresses, IP address groups (specified by incomplete IP addresses, as in 192.168.24 for the 192.168.24.0/24 network), hostnames, domain names, or email addresses. If you use relay_hosts_only, though, specifications must match individual computers, not groups of computers. The CODE tells sendmail what to do with mail from the specified computers:

OK
Sendmail should accept mail for local delivery from the specified host.
RELAY
Sendmail relays mail that originates from or is addressed to the specified host.
REJECT
The server should refuse any message from the specified host using a 5 xx code. Many senders generate a bounce message in response to such a code.
DISCARD
The server should accept and then discard any message from the specified host; no bounce message is generated.

As an example of an access file, consider the following:

localhost.localdomain  RELAY
localhost              RELAY
127.0.0.1              RELAY
spammer@abigisp.net    DISCARD
iamspam.biz            REJECT
192.168.24             RELAY

The first three lines tell sendmail to relay mail that's generated locally (on the localhost address, using any of three common names for that system). Such lines are common in default sendmail configurations. The next line tells the system to quietly discard mail from spammer@abigisp.net; but this rule has no effect on mail from other users of abigisp.net. The fifth line rejects (refuses with a bounce message) mail from the iamspam.biz domain. The last line authorizes sendmail to relay mail that originates from the 192.168.24.0/24 address range, which is presumably the server's own local network.

Once you've created an access file, you must convert that file to binary form using the makemap command:

# makemap hash /etc/mail/access.db < /etc/mail/access
                  

Many distributions include an appropriate command as part of their sendmail startup scripts, so you may not need to explicitly enter this command.

Configuring sendmail to use a relay

Mail relaying involves at least three systems: the source, the destination, and the relay. The destination requires no special configuration, and the last section described the relay itself. On the source side, though, sendmail can require special configuration. Sometimes, the source computer doesn't run sendmail at all; a source might be a desktop system running a mail client. You can, though, use sendmail as a mail source. For instance, the source system might be a Linux computer that runs programs that assume the local computer is running sendmail and that therefore try to send mail using the server. Another configuration is to have a Linux computer serve as both a relay and a source for another relay. For instance, you might want a Linux server to handle mail for your local network but to relay it through an ISP's mail server. In either case, you must configure sendmail to use another computer as a relay.

Tip

By default, sendmail looks up the recipient's address via DNS and attempts to deliver the mail directly. If you configure sendmail to use a relay, as described here, it bypasses this attempt, and instead delivers the mail to the specified relay system.

Most distributions' default sendmail configurations don't use a relay. You can add one to the mix by adding one or more lines to your .mc configuration file:

define(`LOCAL_RELAY', `outgoing.mail.relay')
define(`MAIL_HUB', `outgoing.mail.relay')
define(`SMART_HOST', `outgoing.mail.relay')

The first line applies to outgoing mail that lacks a domain or machine name (for instance, mail addressed to ben); the second applies to mail addressed to users on the computer on which sendmail is running (for instance, ben@armonica.pangaea.edu, where sendmail is running on armonica.pangaea.edu); and the third applies to mail addressed to all other systems.

A somewhat simpler way to implement relaying is to use another line:

FEATURE(`nullclient', `outgoing.mail.relay')

This line, however, is intended for use in otherwise nearly empty configuration files. Only the FEATURE(`nocanonify') option should be used with it.

In all these cases, you must adjust the outgoing.mail.relay to point to the server you want to use as a relay.

Configuring sendmail to forward mail

Particularly when your domain has multiple mail servers or is connected to multiple networks, you may need to configure the system to forward mail in different ways depending on its source or destination. For instance, consider the "gatekeeper" Linux mail server in Figure 13-2. The intent of a configuration like this is to use Linux to provide useful preliminary processing on incoming mail, such as spam filtering and directing email to the correct internal mail server. This server can also pass mail between the two internal servers and filter outgoing mail.

Figure 13-2. Linux can serve as a gatekeeper for one or more other mail servers

Linux can serve as a gatekeeper for one or more other mail servers

Typically, the Linux SMTP server is listed as the domain's MX server, so external systems will deliver mail to it. Likewise, the internal mail servers, and perhaps individual client systems, can deliver outgoing mail to the Linux server. The trick is to configure the Linux server to deliver mail correctly, without getting into an infinite loop. For instance, you don't want the server to attempt to deliver mail for your domain back to itself, because this creates an infinite loop. One solution is to use a feature known as a mailer table. This can be activated with a line like this in the sendmail .mc file:

FEATURE(`mailertable')

This entry may include additional options, such as a pointer to the mailer table database file (typically /etc/mail/mailertable.db). Check your .mc file for the default entry, if it exists. As with many other sendmail files, this one relies on a text-mode file that's converted into a binary database file. The text-mode mailertable file contains entries like this:

.subnet1.pangaea.edu  smtp:exchange1.pangaea.edu
.subnet2.pangaea.edu  procmail:/etc/procmailrcs/exchange2

This configuration tells the server to deliver mail addressed to any computer in the subnet1.pangaea.edu subdomain to exchange1.pangaea.edu using SMTP and to deliver mail addressed to any computer in the subnet2.pangaea.edu subdomain using Procmail and the /etc/procmailrcs/exchange2 Procmail rule set. The first line results in a simple forwarding and so may not be extremely useful; you can just set up your DNS MX record to point directly to that computer. The second line, though, enables you to employ Procmail, which can be used as an interface to spam filters and other tools, on mail passed through the server. Procmail is described in more detail later in this chapter, in Section 13.5.4.

Configuring Postfix

Postfix is an alternative to sendmail that ships with most major Linux distributions, although many of them don't install it by default. If your distribution doesn't ship with Postfix but you want to try it, check the Postfix home page (http://www.postfix.org) for source code download links. You might be able to install a binary package intended for another distribution, but chances are you'll need to modify or replace the SysV startup scripts.

As with sendmail, configuring Postfix for your network requires understanding the main Postfix configuration files. You can then set the main Postfix options, including those relating to addressing, relaying, and spam control.

Tip

Postfix is a very complex server, so this chapter can present only the basics of its configuration. For more information, consult the documentation at the Postfix web site or a book on the subject, such as Postfix: The Definitive Guide (O'Reilly).

Postfix Configuration Files

Linux Postfix binary packages typically store configuration files in /etc/postfix. The main configuration file in this directory is main.cf , which controls the overall Postfix configuration. This file consists of comments, which are denoted by lines beginning with hash marks (#) and option lines of the form:

                  variable = value
               

The variable is typically a descriptive name, such as relayhost to set the hostname of another SMTP server that's to act as a mail relay. The value can be a hostname, IP address, filename, or other string. Sometimes a value can have multiple components, separated by commas. A value can also refer to an earlier variable by name: precede the earlier variable name by a dollar sign ($), as in myorigin = $mydomain to set the myorigin variable to be identical to mydomain.

The default main.cf file is extremely well commented, so you can learn a great deal about the configuration and how you can change it by reading the comments. Further information, including information on obscure options not mentioned in the default comments, can be found in the online Postfix documentation.

Tip

After making changes to main.cf, you should tell Postfix about those changes. The simplest way to do this is to type postfix reload .

In addition to the main.cf file, Postfix relies on several other configuration files. Most of these are binary database file with filenames ending in .db. These files are similar in purpose to sendmail's database files; they control username aliases, relay host mapping, and so on. Like the sendmail files, the Postfix database files are generated from plain-text files that typically take the same name as the database file but without the .db extension. Some of these files are described in upcoming sections.

Postfix Address Options

The Postfix address options begin with setting the server's name. As with sendmail, Postfix uses gethostbyname( ) to determine the computer's hostname and sets the hostname it reports to other systems appropriately. You can override this feature by setting myhostname:

myhostname = smtp.pangaea.edu

Two related options are mydomain and myorigin. The first of these sets the server's Internet domain; it defaults to the value of $myhostname minus its first component, as in pangaea.edu if $myhostname is smtp.pangaea.edu. The myorigin variable sets the hostname that Postfix appends to email addresses that don't specify a hostname. The default value is $mydomain, but you can change this to $myhostname or any other value, as appropriate.

If you want to force outgoing mail to have a particular return hostname, you can use the masquerade_domains option. You pass a domain name to this option, and hostnames within that domain are stripped down to the domain portion. For instance, if you set this option to pangaea.edu, and a user sends mail that has a return address of linnaeus@gingko.pangaea.edu, Postfix changes the outgoing address to linnaeus@pangaea.edu. This can be a handy option for coping with clients that insist on adding their own hostnames to outgoing mail. Mail with return addresses outside of the pangaea.edu domain are unaffected by this line, though. The masquerade_classes option affects the precise parts of the mail that are affected. You can set this to one or more of envelope_sender (the sender in the mail envelope), header_sender (the sender in the mail header), and header_recipient (the recipient in the mail header, typically used to strip hostnames from incoming mail). Typically, one or both of the first two options is used.

Still more complete address rewriting can be accomplished with the help of a file called sender_canonical. You specify the use of this file with the sender_canonical_maps option in main.cf:

sender_canonical_maps = hash:/etc/postfix/sender_canonical

You then edit the sender_canonical file so that each line holds an original email address or address fragment followed by the address or matching fragment you want substituted:

FETCHMAIL-DAEMON@localhost postmaster@pangaea.edu
@mandragora.example.com @pangaea.edu

These lines tell Postfix to replace FETCHMAIL-DAEMON@localhost with postmaster@pangaea.edu and to change any address at mandragora.example.com with the matching address at pangaea.edu. Once you've edited this file, type postmap sender_canonical. This command creates a sender_canonical.db file from the text-mode sender_canonical file.

When receiving mail, Postfix uses the mydestination variable to determine what addresses it's to treat as local. Mail addressed to any user at any of the $mydestination addresses is passed to local users; mail addressed to other addresses is relayed to that address, assuming relaying is authorized. You can set multiple hostnames for mydestination by separating them with commas, as in:

mydestination = $myhostname, localhost.$mydomain, mail.pangaea.edu

Postfix Relay Options

Most default Postfix configurations relay mail from the local network and deliver mail directly to the destination server without using an outgoing relay. Thus, if you want to fine-tune your relay configuration or use an outgoing mail relay system, you must adjust your Postfix configuration. You may also want to make changes if you want Postfix to deliver incoming mail to other servers, such as to Microsoft Exchange servers, using Postfix as a spam filter, mail sorter, or in some role other than the final destination system.

Configuring Postfix to relay mail

The default Postfix configuration relays mail under certain limited circumstances:

  • The sender is on one of the $mynetworks networks. This defaults to the IP subnet on which the computer resides, but you can change it by setting mynetworks to a list of IP address ranges or by pointing to a file that holds this information. Alternatively, you can change mynetworks_style. This variable defaults to subnet, which sets the default behavior; however, you can set it to host, which causes Postfix to trust only the local machine. Setting mynetworks_style to class causes Postfix to trust the computers on the same class A, B, or C subnet on which it resides, which often (but not always) results in the same behavior as setting it to subnet.
  • The sender is in one of the domains specified by relay_domains. This variable defaults to $mydestination.
  • The sender is attempting to relay mail to a computer in $relay_domains or to a computer on the $mynetworks networks.

Overall, these defaults are laxer than those of sendmail. If you don't want your computer to relay mail at all, you should restrict these settings:

mynetworks = 127.0.0.0/8
relay_domains = smtp.pangaea.edu
                  

The first line tells Postfix to relay only mail from the localhost address. The second sets the relay domain to the server's hostname (you should adjust it for your system, of course). A configuration that relays for some computers and networks, but not quite the default set, is also possible; for instance:

mynetworks = 127.0.0.0/8, 172.24.0.0/16, 192.168.24.0/24
relay_domains = $mydestination, pangaea.edu

This configuration tells Postfix to relay mail for two subnets by IP address, for the local domain ($mydestination), and for the pangaea.edu domain.

Configuring Postfix to use a relay

If Postfix should send mail through another computer as a relay, you should use the relayhost option to do the job. This option accepts a hostname as an option; Postfix sends mail through that system. Alternatively, you can provide a domain name if that domain's MX record points to an appropriate server. For instance, suppose you want to relay mail through relay.pangaea.edu:

relayhost = relay.pangaea.edu

Tip

If you're in the same domain as the outgoing mail relay and if your domain's MX record points to the server you want to use, you can use $mydomain as the value of this option. Doing so has the advantage of adjusting automatically should you change your mail relay; Postfix can track the change using the MX record in your DNS server.

If your local DNS server is unreliable or if you use non-DNS methods of local name resolution, you may want to include the disable_dns_lookups = yes option. Ordinarily, Postfix uses DNS in preference to other name resolution methods; disabling this causes Postfix to use whatever name resolution methods are defined locally, such as your /etc/hosts file.

Configuring Postfix to forward mail

Postfix, like sendmail, can serve as a system that forwards incoming mail to its final destination. (Figure 13-2 illustrates this configuration.) The most basic method of configuring such a system is to use what Postfix refers to as a transport map . You point to a file containing this map with the transport_maps option:

transport_maps = hash:/etc/postfix/transport

Such a line may already be present in your default configuration, so check for it before adding it. As with other Postfix references to outside databases, this one uses a text-mode file (/etc/postfix/transport) that's used to create a binary database with a similar name (/etc/postfix/transport.db). The plaintext file has a format that's similar to sendmail's mailertable. For instance, you can have Postfix deliver messages addressed to users in the subnet1.pangaea.edu subdomain to exchange1.pangaea.edu and use Procmail with the /etc/procmailrcs/exchange2 configuration file for addresses in the subnet2.pangaea.edu subdomain with a configuration like the following:

.subnet1.pangaea.edu  smtp:exchange1.pangaea.edu
subnet1.pangaea.edu   smtp:exchange1.pangaea.edu
.subnet2.pangaea.edu  procmail:/etc/procmailrcs/exchange2
subnet2.pangaea.edu   procmail:/etc/procmailrcs/exchange2

This configuration actually includes two lines for each subdomain. The lines with names that begin with dots (.subnet1.pangaea.edu and .subnet2.pangaea.edu) handle mail explicitly addressed to systems within the subdomain. The lines with names that lack leading dots handle mail addressed to the subdomain itself (such as ben@subnet1.pangaea.edu).

Configuring POP and IMAP Servers

SMTP servers tend to attract a lot of attention; after all, Internet mail delivery runs mostly over SMTP. Still, pull mail protocols—POP and IMAP—are just as important in many situations. Typically, users configure their desktop computers' email clients to contact POP or IMAP servers in order to read their incoming mail. Knowing how to handle these servers' configurations is therefore quite important. In the simplest cases, this requires launching the servers and setting authentication options. Most sophisticated servers provide additional options, though.

Launching POP and IMAP Servers

POP and IMAP servers vary in how they're launched. For the popular and simple UW IMAP, the typical method of launching and controlling the server is via a super server. (This method doesn't scale up very well, though, so for a busy server system, you might want to look into launching the server via a SysV startup script, or even running a server that uses this configuration by default.) On distributions that use xinetd as the super server, the UW IMAP package typically ships with one or more files in /etc/xinetd.d . Typically, each file starts the server to handle a single protocol (POP or IMAP, sometimes with variants for different protocol versions or to add encryption). A typical entry looks like this:

service imap
{
  socket_type = stream
  protocol    = tcp
  wait        = no
  user        = root
  server      = /usr/sbin/imapd
  disable     = yes
}

Most distributions disable most or all of the servers by default, by setting the disable = yes option. To enable the server, you must delete this line or change it to read disable = no. You must then restart xinetd, typically by typing /etc/init.d/xinetd restart or something similar. Thereafter, xinetd responds to incoming requests for the protocols you've enabled. If you want the server to respond to multiple protocols, you must be sure to enable them all.

If your distribution uses inetd as a super server, you may need to add one line to /etc/inetd.conf for each protocol you want to use. These lines set the same options that you'd set in the xinetd configuration:

imap stream tcp nowait root /usr/sbin/tcpd imapd

This example uses tcpd (that is, TCP Wrappers) to manage the server. You can therefore use the TCP Wrappers configuration files, /etc/hosts.allow and /etc/hosts.deny, to provide access restrictions based on IP addresses. (You can enable similar restrictions using xinetd's built-in features if you use it as your super server.)

Some POP and IMAP servers, such as Dovecot, are more commonly launched via their own SysV startup scripts. To launch such servers on a one-time basis, you typically pass the start option to their startup scripts:

# /etc/init.d/dovecot start
               

To configure the server to start automatically when you boot the computer, you must set up your SysV links appropriately. Many distributions provide tools to help with this task, such as chkconfig (used by Fedora, Mandrake, and SuSE, among others) or rc-update (used by Gentoo). Consult distribution-specific documentation for more information on these tools.

Setting Authentication Options

The UW IMAP server provides no authentication options, in the sense of command-line arguments or configuration file entries, that affect authentication. (In fact, UW IMAP has no main configuration file.) UW IMAP, though, does use the Linux PAM system for authentication. As such, you can edit the IMAP PAM configuration files to change how IMAP authenticates users. Typical UW IMAP installations actually provide two PAM files, one for POP and one for IMAP: /etc/pam.d/pop and /etc/pam.d/imap. Thus, you can use different authentication options for POP as for IMAP. Some POP and IMAP servers call their PAM configuration files something else; for instance, Dovecot uses /etc/pam.d/dovecot, which controls both POP and IMAP access.

If your server uses the local Linux password database for POP and IMAP authentication, the default UW IMAP PAM configuration files should work fine. If you want to use another authentication tool, though, such as an NT domain controller or a Kerberos server, you need to modify the PAM configuration files. This topic is described in detail in Appendix A.

Tip

Using Kerberos, or any other encrypted network authentication tool, via a PAM configuration encrypts the authentication between the POP or IMAP server and the authentication database, but not between the POP or IMAP server and its client. If you want to encrypt the authentication between the POP or IMAP server and its client, you must either tunnel the protocol in some way (say, via SSH) or use a server and client that support an encrypted exchange natively. The encrypted versions of POP and IMAP are commonly referred to as POPS and IMAPS, respectively.

More sophisticated POP and IMAP servers, including Cyrus IMAP and Dovecot, support their own authentication tools instead of or in addition to PAM. These servers often include configuration options to enable the authentication methods you want to use, so consult their documentation for details.

Additional Options on Advanced Servers

UW IMAP is easy to set up and configure, but it's inflexible; you can't change features such as where it looks for IMAP folders except by editing the source code and recompiling it. Using more sophisticated servers is, of course, an option; however, doing so opens up many additional options, some of which can be tricky to configure. As an example, consider Dovecot (http://www.dovecot.org), which is rising rapidly in popularity. This server uses the /etc/dovecot.conf configuration file to hold options, which take the form:

                  option = value
               

For the most part, the default options work well; however, you might want to peruse the file or the Dovecot documentation to learn about its configurable features. As with many Linux configuration files, dovecot.conf uses hash marks (#) as comment characters, and the default file is well commented.

Dovecot provides options relating to protocol support (protocols, which takes one or more values such as imap and pop), SSL options, an option to disable cleartext authentication (disable_plaintext_auth), the default mailbox format to use (default_mail_env), options to enable special authentication methods, and so on.

Scanning for Spam, Worms, and Viruses

Unwanted email is arguably the worst problem facing email administration today. Two types of unwanted email are common: spam and worms/viruses. Spam is unsolicited bulk email, usually commercial in nature. Most spam markets worthless body-enhancement products, questionable financial advice, and so on but is more of a nuisance than a threat—at least, if you ignore the substantial network bandwidth that spam consumes. Worms and viruses, on the other hand, are malicious computer code that, if executed on an unprotected computer, can spread and cause damage. Despite the fact that spam is quite different from worms or viruses in their intent, the two classes of junk email can be combated in similar ways.

Tip

The distinction between worms and viruses is a tricky one to define and depends on who you ask. Thus, I don't try to distinguish the two types of menaces in this chapter, and hereafter I use the word worm to refer to both types of program. Sometimes I refer to "spam-fighting tools" or the like. Such tools can often be used to fight worms, as well, but such phrases omit this detail for brevity's sake.

Dealing with spam and worms requires first knowing a bit about the types of approaches to dealing with the problem. One of the tools that can be used to directly combat spam and worms is Procmail, so I describe it shortly. Procmail can also be used to invoke other spam-fighting tools. SpamAssassin and Bogofilter are two such antispam tools. Finally, as a site policy issue, you may want to place suspicious attachments in a special holding area until you can examine them.

An Antispam and Antivirus Tool Rundown

Spam and viruses are difficult to detect. This is particularly true of spam, because spam identification is somewhat subjective: one person's spam may be another person's desirable commercial communication. The line between worms and non-worms is clearer, but worms can also be difficult to distinguish between legitimate email attachments, particularly in some environments (for instance, if you have a legitimate business reason to send or receive executable files). For this reason, the number of spam-fighting tools available is quite large. Indeed, the number of approaches to fighting spam and worms is large. Here are some general methods:

Blackhole lists
This approach, described in the earlier sections on sendmail and Postfix, relies on central authorities maintaining databases of IP addresses from which messages shouldn't be accepted or should be accepted only with caution. Typically, these databases are updated frequently, based on spam reports from their users. This method is best implemented in receiving SMTP servers because they receive direct connections from the sending systems and therefore aren't easily tricked into believing the message originated from a false IP address. (Headers are easily forged, so the originating IP address can be obfuscated by clever spammers if another system does this check.) Note that this approach doesn't test the message's content; it's based solely on the IP address and so is susceptible to false alarms should an address send both spam and nonspam messages.
Distributed hashes
Some network databases work on more than the originating IP address; they store hashes of entire spam messages. When your server receives a message, it can hash the message (minus its headers) and query a network server for the presence of this hash. If it's present, it means that somebody else has received an identical message and entered it as spam in the hash database. This approach is a potentially powerful one, but it can be easily "poisoned" with respect to legitimate mailing lists; that is, individuals can classify mailing list messages as spam, which can then cause these legitimate messages to be misclassified as spam. You can work around this problem by creating a "white list" (see entry later in this list) of addresses that aren't tested against a distributed hash system.
Simple pattern matches
Examining the message's content is the most reliable way to identify spam. The simplest type of examination relies on simple pattern matches. For instance, you might decide that any message containing the word Viagra is spam, and discard it. This approach can be implemented in either the SMTP server or in add-on software, such as Procmail. It has the disadvantage of great potential for false alarms, particularly if your rules are too broad. For instance, if you discard all messages containing the word Viagra, you may catch a lot of spam, but you'll also discard legitimate email to people who are actually corresponding with others (perhaps their doctors) about this drug. Maintaining a good set of pattern match rules can also be quite time-consuming, although some packages, such as SpamAssassin, aim to minimize this problem by providing frequent updates to a general rule set.
White lists
A white list is a list of addresses or keywords that trigger automatic acceptance of a message. They're frequently used with simple pattern matches or other spam-catching tools in order to minimize the risk of discarding important messages. Typically, you add your regular correspondents to your white list, and their messages get through even if another rule would reject them. They're usually implemented using the same tools that can perform simple pattern match rejections.
Challenge-response tests
A challenge-response system is a variant on white lists. When a message arrives from a source other than one that's on the white list, the recipient automatically sends a challenge to the message source. This challenge is a message asking the sender to perform some action to prove that the message isn't spam, such as to respond with a keyword. Automated spamming systems can't cope with this request, but humans can. Once a response is received, the original message is delivered, and the sender is usually added to the white list. This method of spam fighting can be quite effective, but it can generate more traffic and, because they must respond to challenges, places an extra burden on those who send mail. A poor implementation can also result in a continuous loop of challenges to challenges, should two systems use similar systems that don't exempt challenges to their own challenges.
Statistical tests
A spam-catching tool that emerged on the scene in 2002 involves statistical tests (often called Bayesian tests , after Bayes' Rule, a statistical principle they employ). These tests use a database of words, word pairs, and other message features. Typically, you feed the software a sample of spam and another sample of nonspam, and the software adds up the number of times a word appears in each category. For instance, Viagra might appear 50 times in spam and once in nonspam, whereas Linux might appear 50 times in nonspam and once in spam. If a message with the word Viagra is analyzed, then, a statistical filter will give it a high probability of being spam. The analysis is typically based on many words, though, so a single word isn't likely to "poison" an analysis, as can happen with simple pattern matches. One statistical spam filter, Bogofilter, is described in more detail later. Some tools, such as SpamAssassin, employ statistical tests as part of their overall operation.

These same tools can detect worms, although some worm-detection tools rely on an analysis of the binary file that's attached to the message rather than English words in the message body. (Some worms can also be reliably identified by their message texts.)

Some tools are hard to classify in just one way. For instance, Procmail directly implements pattern-matching tests but can call other tools that use other methods. The upcoming sections describe Procmail, SpamAssassin, and Bogofilter in more detail.

Sendmail Antispam Options

One way to deal with spam and worms is to use SMTP server features. One of these features in sendmail has already been described: the access.db file, in conjunction with the FEATURE(`access_db') option in your sendmail .mc file. You can block mail from sites known to send nothing but spam using this technique. Unfortunately, the world of spam is a fast-changing one, so by the time you add a hostname or address to this list, chances are the spammer will have started using another. The sheer quantity of spam also makes this approach an awkward one. Nonetheless, you can use this method for some particularly persistent offenders.

Another spam-fighting approach is to use a blackhole list, which is a frequently updated list of sites that are known or suspected spam sources or that shouldn't be sending email directly. Blackhole lists work as services, much like DNS: your mail server queries the blackhole list with the IP address of a connecting server that's trying to initiate a connection, and the blackhole list server returns a value that indicates the sender's status. To use a blackhole list, you enter a line like the following in your sendmail .mc file:

FEATURE(`dnsbl', `relays.ordb.org', `"550 Email rejected due to sending 
server misconfiguration - see http://www.ordb.org/faq/\#why_rejected"')

This line tells sendmail to use the blackhole list at relays.ordb.org and to include a message with a URL in bounced emails. (This enables senders to check the messages, should nonspam messages be bounced.) Of course, this raises a question: how do you know which blackhole list to use? Many are available. You may want to peruse http://www.declude.com/Articles.asp?ID=97 or http://www.moensted.dk/spam/ for pointers to more than 100 blackhole databases with varying criteria for inclusion and other features. Some are free; others require you to pay for the privilege of using them. If you like, you can include multiple blackhole list definitions, each on its own line.

More sophisticated spam-fighting techniques require additional software. In particular, you can add Procmail to the mix to filter on keywords or to call other programs to check your incoming email in various ways. This topic is covered in a later section. If the sendmail server is an intermediary system, you may want to call Procmail as part of the forwarding configuration, as described earlier, in Section 13.2.3.3.

Postfix Antispam Options

Postfix provides a number of antispam options, some of them are quite sophisticated. In addition, you can use Procmail as a delivery agent to call external programs or perform checks Postfix alone can't handle.

One of the simpler Postfix antispam configurations is to use a blackhole list. One main.cf option enables this feature:

smtpd_client_restrictions = reject_rbl_client relays.ordb.org
               

The smtpd_client_restrictions option tells Postfix when to reject mail. The reject_rbl_client value corresponds to a positive lookup in the blackhole list database specified after this value (relays.ordb.org in this example). Postfix can use the same blackhole lists as sendmail; consult http://www.declude.com/Articles.asp?ID=97 or http://www.moensted.dk/spam/ for pointers to more than 100 blackhole databases. Other values can be added to this line, separated by commas, to reject mail from systems that don't have matching DNS A records for their PTR records (reject_unknown_client), to check an external database for rejection rules (check_client_access type:table), and so on. Consult the Postfix documentation for details.

Tip

Prior to Version 2.0, Postfix used a pair of options to achieve the effect described here. Specifically, maps_rbl_domains contained a comma-separated list of blackhole list servers; these were used only if the reject_maps_rbl option was passed to smtpd_client_restrictions.

Spam and worms can often be identified by the presence of strings in message headers or bodies. For instance, you might know from experience that any message with a subject of earn $$$ is spam and can be discarded. Postfix includes several options that check message headers and bodies for such content:

header_checks
This option points to a file that contains checks that are applied to message headers—the parts of a message that contain the subject, the return address, etc. Typically, you'll check headers for suspicious email subjects, senders, and perhaps recipients.
mime_header_checks
Increasingly, email messages use Multipurpose Internet Mail Extension (MIME) to encode special formatting and nontextual data. MIME extensions are also loved by spammers and worm authors because they can deliver text that's harder to identify as spam or malicious computer code. You can use this option to point to a file that matches suspicious MIME headers. This option is available in Postfix 2.0 and later, and defaults to $header_checks.
nested_header_checks
Users and programs sometimes attach one email message to another. To search such attached messages' headers, you can use this option, which is available only in Postfix 2.0 and later, and defaults to $header_checks.
body_checks
This option searches email messages' bodies—the parts of the message that users read, as opposed to the headers. Scanning message bodies can be a good way to identify worms and spam. This option is available only in Postfix 2.0 and later.

All of these options take an external filename, along with a code for the file's format, as an option. This file is typically a plain-text file or a database file that's derived from a plain-text file. The resulting entry in main.cf looks something like this:

header_checks = pcre:/etc/postfix/header_checks

The pcre code stands for Perl compatible regular expression. Alternatively, you can employ regexp to use non-Perl regular expressions. In either case, lines in the original text file take the specified form followed by one of the following action codes:

DISCARD optional text
Accepts the message for delivery but quietly rejects it. If optional text is present, enter it in the mail logs; otherwise, log a generic message.
DUNNO
Moves on to the next input line. This option is synonymous with OK.
FILTER transport
destination
Passes the message through the external content filter, as specified by the transport method (smtp, procmail, and so on) and destination (a hostname or filename, typically). The filter receives the message only after Postfix has examined all the message's lines, so the message can be rejected before the filter is called.
HOLD
Places the message in the hold queue, which is a sort of limbo in which the message is neither delivered nor discarded. A system administrator can examine the hold queue using the postcat command and release messages from the queue or destroy them using postsuper.
IGNORE
Ignores the current line of input and moves to the next one.
PREPEND text
Places the specified text at the start of the input line. This can flag lines for further spam processing.
REDIRECT user@domain
Sends the message to the specified user rather than the recipient specified by the mail's envelope. This feature can be used to forward mail for users who have moved elsewhere, as an alternative method of forwarding mail to internal servers, and in other ways. However, many potential uses of this action are better achieved through other means.
REJECT optional text
Rejects delivery of the message. If you specify optional text, it's passed to the sender; if not, a generic error message is delivered to the sender.
WARN optional text
Logs a warning with the specified optional text in the mail log file. This action is intended primarily for testing new rules before implementing them.

Many of these action codes are available only in Postfix 2.0, 2.1, or later. As an example of their use, consider the following entries:

### Subject headers indicative of spam
/^Subject: ADV:/ REJECT
/^Subject: Accept Credit Cards/ DISCARD
### Additional header checks
/^(From|Received):.*iamspam\.biz/ REJECT
/^From: spammer@abigisp\.net/ FILTER procmail:/etc/procmailrcs/maybespam

This set of rules rejects mail with a subject header of ADV: or with from or received headers that include the string iamspam.biz. It also discards mail with a subject header of Accept Credit Cards and passes mail from spammer@abigisp.net through a Procmail filter, /etc/procmailrcs/maybespam. This filter presumably performs additional checks that are too complex for Postfix to handle by itself.

In addition to its own checks, Postfix can send mail through Procmail for processing. In fact, using Procmail is usually the default. If in doubt, check your main.cf file for a line like the following:

mailbox_command = /usr/bin/procmail

When called in this way, Procmail is used for final message delivery. You can call it in other ways, such as in a FILTER action in a header check. Broadly speaking, Procmail is a more powerful way of looking for suspicious patterns in email than Postfix's own rules. Procmail can also be customized on a user-by-user basis, which is harder with Postfix's rules. Thus, you may prefer to use Procmail alone, rather than use Postfix's pattern matching tools. The main advantage of Postfix's rules is that they can be used to reject messages before they're fully received. In particular, if a header check causes a message to be rejected, Postfix refuses delivery before many bytes are transferred. This feature can help conserve bandwidth, at least if you can devise rules that correctly identify large spams or worms from their headers alone. Procmail delivery rules, by contrast, operate only after the mail server has accepted the mail for delivery. Unfortunately, spammers and worm writers have become very good at disguising their unwanted emails' headers, so you may have no choice but to accept the entire email in order to properly identify it. The topic of spam and worm control is covered in more detail later in this chapter.

Using Procmail

Procmail is a very powerful mail processing tool. It does far more than spam filtering; it can redirect mail based on nonspam criteria, sort mail into folders, copy messages for archival purposes, pass mail through arbitrary external programs, and more. Still, one of Procmail's main applications is as a spam-fighting tool; you can use its native pattern-matching features to discard mail or shunt it into a suspected spam folder. You can also pass messages to external programs for tests that Procmail can't handle by itself.

Using Procmail requires calling it in some way. Typically, you do so by configuring your SMTP server to call Procmail as part of its mail delivery process. You can then move on to Procmail configuration. To configure Procmail you need to understand the Procmail configuration file format and be able to create Procmail recipes, which are the rules used to direct mail in Procmail.

Calling Procmail

The first step in Procmail use is to ensure that your mail system uses it. Most Linux SMTP server configurations use Procmail by default, so you may not need to change anything about your basic SMTP configuration to use Procmail. If you're in doubt, though, or if you want to fine-tune the configuration, you can check some settings:

Sendmail
You should set three options in the sendmail .mc file to use Procmail. The first of these is:
define(`PROCMAIL_MAILER_PATH', `/usr/bin/procmail')

This tells sendmail where to find the Procmail binary. (Some configurations put this option in another configuration file, but you can override it in your sendmail .mc file if you need to do so.) The remaining options are FEATURE(`local_procmail') and MAILER(procmail), which collectively tell sendmail to use Procmail for local deliveries. As described in the earlier Section 13.3.3.3, you can also call Procmail in other ways, such as in a forwarding configuration.

Postfix
To call Procmail as part of the Postfix delivery rules, you must tell Postfix to use the Procmail binary as part of its delivery system: mailbox_command = /usr/bin/procmail. As described in an earlier section, you can also tell Postfix to use Procmail in mail forwarding configurations.

The Procmail configuration file

Procmail can use one or more of several configuration files:

/etc/procmailrc
This file is the global Procmail configuration file. It's called as root to process all the mail that the SMTP server handles. For spam-control purposes, you use this file to apply rules you want to use on all the email that's delivered to your local users. Typically, this means you use it to apply rules that are very unlikely to result in false alarms.
~/.procmailrc
Individual users can create .procmailrc files in their home directories. These files have the same format as /etc/procmailrc, but they're applied only to email directed to specific users. This enables users to apply their own customized Procmail rules. Alternatively, you can provide some standard configuration files in specific locations and allow users to create symbolic links to those files to achieve preset effects.
Other configuration files
Some methods of calling Procmail, such as those that use Procmail as part of mail forwarding schemes, enable you to pass the name of a configuration file to Procmail. Sometimes these reside in a directory such as /etc/procmailrcs, but that location is arbitrary.

Warning

Procmail runs as the user who calls it, although when it's called as root, it can drop its privileges under some circumstances. A rule that works well in ~/.procmailrc (when Procmail is called as the end user) may not work well when placed in /etc/procmailrc (when Procmail is called as root), or vice versa. Typically, you must be more careful about file permissions when calling Procmail as root, because writing to or creating a file (such as a mail folder) as root can make that file inaccessible to ordinary users, such as the mail's intended recipient.

Whatever its name, a Procmail configuration file consists of three parts: comments (denoted by hash marks), environment variable assignments (similar to those in bash, such as MAILDIR = $HOME/Mail), and recipes (described next). The bulk of most Procmail configuration files consists of its recipes.

Creating Procmail recipes

Procmail recipes consist of three parts: the identification line, the conditions, and the action. The idea is that the action is initiated when the conditions are met. For instance, a condition might be that the string Viagra appear in the message body, and the action might be that the message is sent to /dev/null—that is, that the message be discarded. The form of the recipe is as follows:

:0 [flags] [:[lockfile]]
[conditions]
action
                  

The identification line always begins with :0; that's just the convention. The flags are described shortly; they specify where Procmail looks for condition matches, how it matches, and so on. The lockfile is a file that controls access to a mail file. If a file is locked, Procmail defers operating on it. Normally, a single colon (:) is sufficient, but you can specify the filename, if necessary. The conditions are technically optional, but in practice, most recipes have at least one condition line. (A recipe with no conditions lines matches all mail messages.) Including multiple conditions causes Procmail to require all of them to match before an action line is implemented. Precisely one action is required for each recipe.

Procmail's default behavior is to match conditions against message headers in a case-insensitive way. Several flags are available to change how Procmail handles these matches, though. Here are the more common:

H
This value does matches on message headers, which is the default.
B
This value does matches on message bodies.
D
This value does a case-sensitive pattern match, as opposed to the normal case-insensitive match.
c
Ordinarily, if a recipe matches, it's passed to the action, which may discard it, alter it, or otherwise make the original inaccessible. This option causes the action to act on a "carbon copy" of the original message, which is useful if you want to, for example, send a duplicate copy of a message to another account or mail folder.
w
This value causes Procmail to wait for the action to complete. If the action fails, Procmail leaves the message in the queue for other recipes.
W
This option is similar to w, but it suppresses error messages.
f
This option pipes a message through another program, treating that program as a filter.

The Procmail recipe conditions can look like Greek to the uninitiated. Each begins with an asterisk (*), followed by a regular expression . At its simplest, a regular expression is simply a string that must match exactly. For instance, the regular expression Viagra matches the word Viagra in the input. Many characters have special meanings, though, such as:

^
A caret symbol indicates the start of a line; for instance, ^Viagra denotes the string Viagra, but only at the start of a line. Many conditions begin with a caret.
$
This character signifies the end of a line.
.
A period matches any single character except for an end-of-line character. For instance, h.t matches hat, hut, hot, or any other similar string.
x*
This string (where x is any single character) matches any number of x characters, including none. This is often combined with a dot (.), as in .*, to match any arbitrary group of characters.
x+
This expression works much like x*, but matches any occurrence of one or more x characters, rather than 0 or more.
x ?
This string matches zero or one x characters.
( string1|string2 )
This expression matches one of two strings by separating them by a vertical bar within parentheses. This principle can be extended to more than two strings, as well.
( string )*
This expression matches zero or more instances of the specified string.
[ chars ]
Placing characters within square brackets causes Procmail to match any one of the enclosed characters. For instance, [abcz] matches any one of the characters a, b, c, or z. You can specify a range of characters by using a dash, as in [c-j] to indicate any letter between c and j.
\
The backslash character removes the special meaning from the subsequent character. For instance, to match a dot, you enter \. in the conditions.
!
This character appears only at the start of a conditions line and reverses its meaning; that is, if the regular expression matches, the recipe does not match.
?
Like !, this character appears only at the start of a conditions line. It tells Procmail to use the exit code of the specified program.

Regular expressions can be extremely complex, so you may need to consult the Procmail manpage or another source of information on regular expressions to learn more. The next section provides some examples.

Finally, each Procmail recipe ends with an action. Each action can take any of several forms:

A filename
An action that takes the form of a filename indicates that the message is to be stored in the specified file, which is treated as an mbox mail folder.
A subdirectory name
A filename that ends in a slash (/) is interpreted as a subdirectory name, in which case Procmail stores the message in this subdirectory in maildir format.
!
An exclamation mark denotes a list of email addresses to which the message should be forwarded. This can be useful for setting up individual mail forwarding to another system.
|
Procmail treats a vertical bar as a pipe character, much like bash. Its presence at the start of an action tells Procmail to pass the message to an external program for further processing.
{
You can nest multiple tests by using a left curly brace as the action line; subsequent lines, until a right curly brace (}), constitute one or more additional recipes that are used only if the initial recipe matches. You can use this feature to control whether or not to perform certain tests; for instance, to perform spam checks only if mail doesn't come from certain addresses (that is, to implement a white list).

Because Procmail supports just one action per recipe, you may need to create an external script if you want to perform some complex action. Be sure your external script reads the entire message. If it doesn't, Procmail may send the message through additional recipes, which can result in duplicate deliveries.

Examples of Procmail recipes

Example 13-1 shows a sample Procmail recipe file intended for use by individuals. (When used by the system, some file ownership issues can arise. This problem can be avoided by adding a DROPPRIVS = yes line to the start of the file.) This example illustrates several useful techniques:

Nesting
The first rule contains two nested subrules, the intent being to exclude mail from two regular correspondents from spam checks, which are nested. The nested rules are indented to set them off, but this indentation isn't required.
Spam checks
The two spam-check rules look for strings that are indicative of spam. The first searches message bodies for the string 301 followed by 0 or more characters, followed by S, 0 or more characters, and 1618. This string is found in some spams that reference a failed piece of U.S. legislation, S.1618, which dealt with spam. The legislation failed years ago, but spam still references it, as if to legitimize itself. The second spam check looks for a string in subject headers that identifies messages encoded using a system that's common for certain Asian languages. Most non-Asian users seldom or never receive nonspam mail with such subject headers, but a lot of spam uses them.
Flags
Several rules use flags to search the text of messages or to create carbon copies.
Mail sorting
The spam messages are "sorted" to /dev/null, which effectively discards the messages. The last rule saves mail from a mailing list (identified by a unique "to" header) into the genetics-list mbox mail folder in the $MAILDIR subdirectory, which is identified on the first line of the recipe file.

Example 13-1. Sample Procmail recipe file

MAILDIR = $HOME/Mail

# Do some spam checks, but exclude anything from good addresses
:0
*! ^From:.*(goodguy@pangaea\.edu|linnaeus@example\.com)
{
  :0 B
  * ^.*301.*S.*1618
  /dev/null

  :0
  * ^Subject:.*=\?big5\?*
  /dev/null
}

# Forward mail from goodguy@pangaea.edu with "peas" in the
# subject line to mendel@luna.edu
:0 c
* ^From:.*goodguy@pangaea\.edu
* ^Subject:.*peas
! mendel@luna.edu

# Shunt mail from a genetics mailing list into its own folder
:0:
* ^To:.*genetics@mailer\.example\.org
$MAILDIR/genetics-list

One of the major problems with using Procmail alone as a spam-control tool is that creating and maintaining a set of Procmail rules can be quite labor-intensive. This is particularly true because spam and worms are constantly changing, so a good set of rules for today may be inadequate tomorrow. You may want to search for a ready-made set of Procmail recipes, such as SpamBouncer (http://www.spambouncer.org) or the Sample Procmail Recipes with Comments (http://handsonhowto.com/pmail102.html). The first of these is specifically intended as an antispam tool, whereas the second is a practical teaching tool. If you periodically check back with such pages and update your filters, you can keep a reasonably up-to-date Procmail antispam configuration. On the other hand, rules created by somebody else are more likely to miss spam or, worse, falsely identify nonspam as spam.

Tip

Before deploying a new Procmail recipe, or especially an extensive set of recipe changes, try testing it on a small scale. You can create a test account, place your new recipe in its .procmailrc file, and send test messages—both spam and nonspam—to that account.

Using SpamAssassin

SpamAssassin (http://spamassassin.apache.org) is an antispam tool based on a large number of tests. Each test changes the score of the message. SpamAssassin doesn't actually delete messages; instead, it adds headers identifying likely spam as such. The idea is that you'll call SpamAssassin from Procmail, a mail server, or a mail reader and use it to detect the SpamAssassin spam report and delete or redirect messages based on that report.

Tip

SpamAssassin has grown into quite a large tool. In fact, it's complex enough that it's spawned its very own book: SpamAssassin (O'Reilly). If you need to perform complex tasks or configure SpamAssassin as part of a mail server for a large site, it's worthwhile to read this or other SpamAssassin-specific documentation.

SpamAssassin basics

The SpamAssassin software comes with most major distributions, so installing it from your distribution medium is usually the simplest course of action. If you can't find SpamAssassin with your distribution, go to the main SpamAssassin site, and download it. SpamAssassin is actually a Perl script and relies on several Perl modules, so you may need to install additional packages that hold these modules.

Once SpamAssassin is installed, you should test its operation by manually feeding it a few spam and nonspam messages. You do this by redirecting a message in a file into the spamassassin command. Adding the -t option adds an extra report to the end of the output, which appears on the screen:

$ spamassassin -t < 
                     message.txt
                  

The message.txt file should contain a complete message, including full headers. Most mail readers have an option to save messages to disk with full headers so use that option to get your samples. The SpamAssassin output includes two additions to the message. The first addition appears at the end of the message headers, and constitutes SpamAssassin's report, as intended for subsequent mail processing tools, such as Procmail or an email reader. For a nonspam message, this addition is likely to resemble the following:

X-Spam-Checker-Version: SpamAssassin 3.0.0-g3.0.0 (2004-09-13) on 
mail.example.com
X-Spam-Level:
X-Spam-Status: No, score=0.1 required=5.0 tests=RCVD_IN_SORBS 
autolearn=unavailable version=3.0.0-g3.0.0

The first line simply identifies the version of SpamAssassin and the computer on which it's running. The second line holds the spam level, which is expressed as a number of asterisks (*). Because this is an innocuous nonspam message, no asterisks are displayed; however, some nonspam messages will have a small number of asterisks (five is the typical cutoff point for spam, although you can use something else if you like). The third line, which typically extends across multiple lines, summarizes the tests that raised alarms. In this case, the total spam score is 0.1 (hits=0.1). That 0.1 value came from the RCVD_IN_SORBS test, which isn't explained at this point. The -t option to spamassassin, though, adds extra lines at the end of the message:

Content analysis details:   (0.1 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 0.1 RCVD_IN_SORBS          RBL: SORBS: sender is listed in SORBS
                            [172.24.98.102 listed in dnsbl.sorbs.net]

This text identifies the RCVD_IN_SORBS flag as meaning that the sender address is listed in the SORBS blackhole list. This information can help you understand what SpamAssassin is doing right (or wrong), but it's not provided in normal operation. You can, of course, consult the SpamAssassin documentation to learn more about specific tests.

When you test a spam message, the spam headers added to the message are likely to report more serious problems:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.0-g3.0.0 (2004-09-13) on 
mail.example.com
X-Spam-Level: ******
X-Spam-Status: Yes, hits=6.8 required=5.0 tests=FORGED_MUA_OUTLOOK,
        FORGED_OUTLOOK_TAGS,HTML_40_50,HTML_FONTCOLOR_UNSAFE,HTML_MESSAGE,
        HTML_TAG_EXISTS_TBODY,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,
        RCVD_IN_SORBS autolearn=spam version=3.0.0-g3.0.0

This output includes one header line that's not present in the nonspam output: X-Spam-Flag: YES. You can search for this line using a Procmail recipe, as described shortly, to detect spam after messages have been processed with SpamAssassin. The X-Spam-Level header shows six stars, corresponding to the 6.8 hit rating reported in the X-Spam-Status line. This line also shows quite a few hits on individual spam tests. These are reported in greater detail at the end of the message if you use the -t option to spamassassin.

You should run several spam and several nonspam messages through SpamAssassin. You should verify that none of the nonspam messages are rejected and that a significant number of spams are rejected. SpamAssassin might not detect all of your spams, though. You can take the time to fine-tune its operation by changing the points assigned to individual rules or by enabling its auto-learning feature, which enables it to update its rules on the fly. You can also combine SpamAssassin with other tools, such as your own custom Procmail filters.

Calling SpamAssassin from Procmail

You can call SpamAssassin in various ways. One is to use Procmail for local mail delivery. (Calling SpamAssassin as part of a mail gateway system is described next.) Add the following recipes to the start of your Procmail configuration file to call SpamAssassin and sort suspected spam into two folders, almost-certainly-spam and probably-spam:

:0fw
* < 256000
| spamassassin

:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
almost-certainly-spam

:0:
* ^X-Spam-Status: Yes
probably-spam

:0
* ^^rom[ ]
{
  LOG="*** Dropped F off From_ header! Fixing up. "

  :0 fhw
  | sed -e '1s/^/F/'
}

These rules are taken from the procmail.example file that ships with SpamAssassin. That file also includes several comments that describe its rules. In short, the first recipe passes messages that are smaller than 256,000 bytes through SpamAssassin, which adds its headers to the messages. (Larger messages are almost certainly not spam, although they can contain worms. SpamAssassin doesn't cope well with very large messages, hence this size limitation.) The second recipe dumps messages with a spam score of 15 or higher into the almost-certainly-spam folder, while the third recipe places messages that are flagged as spam but that weren't caught by the second recipe into the probably-spam folder. The final recipe fixes a Procmail bug that can cause the leading F in the From: to be dropped. (This bug has been fixed, but it's included in case you're running an old version of Procmail.)

Of course, you can change these rules if you like. For instance, you can send suspected spam to /dev/null, but doing so means that if any such messages really aren't spam, you won't be able to retrieve them. Placing suspected spam in folders means that you can open those folders and recover any misclassified messages.

Calling SpamAssassin from sendmail

Calling SpamAssassin from Procmail is fine for local mail delivery, but it doesn't work well for a mail server that should operate as a spam filter for another server, such as a Microsoft Exchange server. For this configuration, you need a way to call SpamAssassin more directly as part of the mail relay process; the MIMEDefang tool (http://www.mimedefang.org) can do so. Although a complete description of MIMEDefang and the sendmail features it uses is beyond the scope of this book, a brief description should get you started.

The key to the process is to use the sendmail INPUT_MAIL_FILTER configuration line to call MIMEDefang, which in turn is configured to pass incoming messages through SpamAssassin and take actions accordingly. A full sendmail .mc file that implements these features appears in Example 13-2.

Example 13-2. Sample sendmail configuration with SpamAssassin

divert(-1)
#
# Spam-checking gateway configuration
#
divert(0)dnl
VERSIONID(`Spam-checking gateway')
OSTYPE(linux)dnl
DOMAIN(generic)dnl
FEATURE(virtusertable)dnl
FEATURE(mailertable)dnl
FEATURE(access_db)dnl
FEATURE(always_add_domain)dnl
FEATURE(nouucp,`reject')dnl
FEATURE(`relay_based_on_MX')dnl
define(`confDEF_USER_ID',``8:12'')dnl
define(`confPRIVACY_FLAGS', \
  `goaway,noreceipts,restrictmailq,restrictqrun,noetrn')dnl
define(`confTO_QUEUERETURN',`7d')dnl
define(`confTO_QUEUEWARN_NORMAL',`1h')dnl
define(`confMAX_DAEMON_CHILDREN',`60')dnl
define(`confMAX_MESSAGE_SIZE',`10000000')dnl
define(`confMAX_CONNECTION_RATE_THROTTLE',`10')dnl
define(`confMAX_RCPTS_PER_MESSAGE',`500')dnl
INPUT_MAIL_FILTER(`mimedefang',`S=unix:/var/spool/MIMEDefang/mimedefang.sock, \
   F=T, T=S:60s;R:60s;E:5m')dnl
MAILER(smtp)dnl
MAILER(local)dnl
MAILER(procmail)dnl

Tip

A couple of lines in Example 13-2 are very long; they're denoted by trailing backslashes (\) at the end of the first line, but should be entered on single lines without the backslashes.

This configuration also requires you to set up the sendmail mailer table file (typically /etc/mail/mailertable) that was described earlier. It must include a line that points the system to an internal server that will receive the spam-filtered messages:

pangaea.edu   esmtp:internal.pangaea.edu

In addition to the sendmail configuration, you must configure MIMEDefang. This tool requires three directories, /var/spool/MIMEDefang, /var/spool/MD-Quarantine, and /var/spool/MD-Bayes. Assign ownership of these directories to the account used to run MIMEDefang (typically defang). Once this is done, edit mimedefang-filter (usually stored in /usr/local/etc/mimedefang). Set the $AdminAddress, $AdminName, and $DaemonAddress lines to point to your local postmaster's email address, the postmaster's name (often your domain's name and Postmaster), and the email address used in messages MIMEDefang generates. You should also set the $SALocalTestsOnly item to 0 or 1 to forbid or allow SpamAssassin to use network-based tests.

Configure the internal server computer (internal.pangaea.edu in this example) to accept mail only from the spam-filtering gateway or from this system and any local systems that should be able to relay outgoing mail. This server shouldn't accept mail directly from the outside. Certainly it shouldn't be listed as an MX server in your domain's DNS configuration; only the spam-filtering mail gateway should be listed in this capacity.

Using Bogofilter

Unlike SpamAssassin, which combines many different spam-fighting tools in one system, Bogofilter (http://bogofilter.sourceforge.net) takes a single approach to spam fighting. It's an implementation of a statistical spam filter. As such, it requires training on a corpus of both spam and nonspam messages before it can work. Thus, you may need to save your spam for a few days before you can effectively use Bogofilter.

Tip

SpamAssassin can use a statistical filter as part of its rule set. To do so, you must give it sample messages to train it, using its sa-learn command. Consult this command's manpage for details; the training process is similar to that for Bogofilter, although the command details differ.

Bogofilter can be installed like most other packages; check your distribution to see if a version is available with it. If not, go to the project's home page, and download a binary or source code version from there. Like SpamAssassin, Bogofilter is called from Procmail or a mail reader program. Before you do that, though, you must train Bogofilter.

The training procedure requires examples of both spam and nonspam messages—the more, the better. (A collection of several thousand messages is not excessive, but Bogofilter can do some good with just a few dozen.) Ideally, these messages should be typical of spam and nonspam messages that you receive; you want Bogofilter to learn to differentiate your spam from your nonspam. Although you can find spam collections on the Internet, using them for Bogofilter training can cause problems, because other people may receive different types of spam, or because you might not classify everything in such collections as spam. The simplest way to train Bogofilter is to place all your spam messages in one file and all your nonspam messages in another file, both of which should be in mbox format. In subsequent examples, I refer to these as spam.mbox and nonspam.mbox, respectively.

Conceptually, the simplest way to train Bogofilter is to pass the spam and nonspam messages through the bogofilter command using the -s and -n options, respectively:

$ bogofilter -s < spam.mbox
$ bogofilter -n < nonspam.mbox
               

These commands create a database file, ~/.bogofilter/wordlist.db, which contains all the words contained in all the messages, along with counts of how often they appear in spam and nonspam messages. When Bogofilter later encounters a spam, it can then use these classifications to estimate the probability that a message is spam or nonspam.

Tip

Because the Bogofilter database file is stored in the user's home directory, you should create the Bogofilter database file by running the program as that user. This user can conceivably be root, but for security reasons, it's best if you find a way to run Bogofilter as a non-root user. If necessary, you can create an initial database, place the mail classification call to bogofilter in users' individual ~/.procmailrc files, and modify the global configuration to use the global word files in addition to individual users' word lists.

Another approach to Bogofilter training is to use a training script, such as bogominitrain.pl or randomtrain. These scripts might or might not be shipped with a distribution-provided Bogofilter package. If they're not on your system, consult the main Bogofilter site. These scripts perform more sophisticated training; namely, they use the bogofilter command to classify each message and perform training only if the message isn't classified correctly by Bogofilter. If necessary, this process is repeated until Bogofilter classifies every message correctly. The result tends to be smaller databases, and often more accurate results, but initial training takes longer. Consult the documentation that comes with the training script for details. Typically, you pass the script the names of both the spam and the nonspam files, and perhaps additional parameters:

$ bogominitrain.pl -fnv ~/.bogofilter nonspam.mbox spam.mbox '-o 0.9,0.3'
               

This example passes the location of the word list, the nonspam and spam files, and classification parameters (described in more detail shortly).

Whatever training method you use, you should also examine, and perhaps modify, the Bogofilter configuration file. By default, /etc/bogofilter.cf provides systemwide defaults, but individual users can override these by creating a configuration file called ~/.bogofilter.cf (this filename can be set in /etc/bogofilter.cf). Options in this file are well commented, so perusing it will give you some idea of what you can change. Some options you may want to modify include:

bogofilter_dir
This option points to the word list directory. Changing it is one way ordinary users can access a global word list; however, doing so may make it impossible for individuals to change that word list.
ignore_case
Ordinarily, Bogofilter pays attention to case; Viagra is distinct from viagra. You can set ignore_case=yes to have Bogofilter convert all words to lowercase, though. This can help overcome attempts to confuse antispam tools by mixing up case in words, but it can also reduce Bogofilter's sensitivity to strings for which case can be important.
algorithm
Bogofilter can use several different algorithms for determining the spamicity of a message (that is, the probability that a message is spam). These algorithms are graham, robinson, and fisher. The default is fisher, which generates a three-way classification: spam, nonspam, or unsure.
ham_cutoff
This option sets the maximum spamicity score (between 0.0 and 1.0) that's needed to classify a message as nonspam. A value of 0.10 is typical and usually works well.
spam_cutoff
This option sets the minimum spamicity score (between 0.0 and 1.0) that's required for a spam classification. A value of 0.95 is typical and usually works well.

Once you've set these values and trained Bogofilter, you should test its operation by passing spam and nonspam messages through the bogofilter command. Ideally, you should use messages that you held back from the training so that you can judge how Bogofilter handles messages it's never seen. Use the -v option to have the program generate a verbose report of the input messages, which you redirect as input:

$ bogofilter < 
                  message.txt

X-Bogosity: Yes, tests=bogofilter, spamicity=1.000000, version=0.16.4

This result shows a classification of the message as spam (X-Bogosity: Yes), with a very high spamicity score (1.000000). A nonspam message is likely to generate a much lower score:

$ bogofilter < 
                  message.txt

X-Bogosity: No, tests=bogofilter, spamicity=0.000000, version=0.16.4

Because of its three-way output, Bogofilter can also tell you that it's unsure of the status of the message:

$ bogofilter < 
                  message.txt

X-Bogosity: Unsure, tests=bogofilter, spamicity=0.500008, version=0.16.4

If you find that Bogofilter isn't classifying your messages correctly, you should revisit your training procedures. Perhaps you didn't classify enough messages or delivered them with the wrong parameters (confusing spam and nonspam messages, for instance). Note that a classification of "unsure" works like a nonspam classification in most respects, so you shouldn't be too concerned if some of your nonspam messages are classified in this way, unless the spamicity ratings are very close to the spam cutoff point. If you have classification problems, you might also consider fine-tuning the Bogofilter cutoff criteria (ham_cutoff and spam_cutoff). You can increase or decrease these values, but with certain risks; if you make either the nonspam or spam category too large, you'll risk misclassifying messages.

Tip

Numerically, the largest range of spamicity values is above the ham_cutoff value but below the spam_cutoff value. Thus, you might expect that most messages will end up classified as "unsure." In practice, though, most messages achieve very high (close to 1.0) or very low (close to 0.0) spamicity ratings.

With Bogofilter now correctly classifying at least most of your messages, it's time to integrate it into your mail delivery system. One way to do this is by calling Bogofilter in Procmail. The following Procmail recipe will do this:

:0HB:
* ? bogofilter -u -l
probably-spam

This recipe passes the message through the bogofilter command. The -u option tells Bogofilter to automatically add messages that it classifies as spam or nonspam to the appropriate word lists. This option is both potentially useful and potentially dangerous; it's useful because it can help keep your spam database updated, but it's dangerous because if Bogofilter misclassifies a message, that misclassification can lead to more misclassifications. (If a message is classified as "unsure," it won't be added to the database.) The -l option logs Bogofilter activity. This recipe stores spam messages in the probably-spam folder; nonspam messages go on for normal delivery.

If you use the -u option, and Bogofilter misclassifies a message, you should correct the problem. You can do this with the -N and -S options, which undo previous registrations of a message as nonspam and spam, respectively. You can combine these options with -s and -n to reregister the messages correctly. For instance, if Bogofilter has registered a message as nonspam but in fact it's spam, you can extract the message to a file (complete with its headers) and type the following command:

$ bogofilter -Ns < 
                  message.spam
               

To test that it's worked correctly, pass the message through bogofilter again, using -v rather than -Ns; Bogofilter should now classify the message as spam, or at least give it a much higher spamicity score. (Register it again with bogofilter -s to strengthen Bogofilter's tendency to classify the message as spam, if desired.) Use -Sn rather than -Ns to undo an incorrect classification of a nonspam message as spam.

Discarding or Quarantining Suspicious Attachments

The vast majority of email worms released over the years have been written for Windows systems. Any of the antispam tools described here can be used to locate and deal with worms. Using a Linux system for this task ensures that the mail server itself can't become infected, even through gross negligence. (At least, assuming Windows worms are in play; theoretically, Linux worms could be written to take advantage of flaws in Linux software.)

The threat of Windows worms is such that many sites have taken drastic measures to protect themselves: they reject all mail carrying certain types of attachments, or even all email attachments. The reasoning is that nobody has a valid reason to email, say, Windows .exe executables, so any such executable must be a worm. The validity of such reasoning is uncertain, but it may be so close to the truth for certain sites that discarding or quarantining messages with such attachments may be worthwhile. Example 13-3 shows a couple of Procmail recipes that discard certain suspicious messages.

Example 13-3. Procmail recipes to discard suspicious attachments

:0 B
* ^Content-Type: audio/x-(wav|midi);
/dev/null

:0
* ^Content-Type: multipart/(mixed|alternate|alternative|related)
{
  :0 B
  * ^.*name=.*\.(bat|com|exe|pif|scr|vbs|zip)
  /dev/null
}

The first of these rules discards everything with a Content-Type line of audio/x-wav or audio/x-midi. Theoretically, these lines identify certain types of audio files, which might be legitimate attachments in some environments; however, in practice, worms often try to masquerade as these file types. The second rule looks for any of several content types in the header and, if found, searches for a line that includes name= followed by any of several filename extensions. Some of these, such as .bat, .com, and .exe, identify Windows executables. Others don't, but again, Windows worms frequently try to masquerade as files of these types.

Unfortunately, rules such as these are likely to produce false alarms. The second rule is particularly overzealous because it discards messages with attached Zip files. You can, of course, eliminate some of these filename extensions, but that reduces the effectiveness of the tests. Alternatively, you can enclose the test in a white-list test to enable trusted senders to deliver mail containing these attachments. Another option is to rename attachments rather than discard them; for instance, rename a .zip file to .zip.txt. This enables users to access the files, but makes it harder for worms that are named in this way to do harm automatically.

These rules, as shown in Example 13-3, are also potentially deleterious because they discard the messages by sending them to /dev/null. Placing the messages in a folder to hold suspected worms might be a good alternative. Users can then open the messages only with extreme caution. If you place these rules in a systemwide Procmail configuration file, you can even send the suspect messages to a mail folder that only root can read.

Supplementing a Microsoft Exchange Server

Linux can fit into a network's email picture in any of several ways. One obvious way is to function as your domain's primary mail server, handling SMTP and, if you desire, POP or IMAP. Used in this way, the Linux mail server will most likely communicate with Windows desktop systems as POP or IMAP clients. This configuration can work quite well, but many Windows networks already have a Microsoft Exchange mail server. At first glance, there seems to be little reason to deploy a Linux mail server if you already have a working Microsoft Exchange server. Sometimes, though, a Linux server can be used to help an Exchange server.

Tip

Microsoft Exchange provides features that are most readily used by Microsoft email clients, and that aren't fully replicated by non-Microsoft servers. Thus, depending on your needs, a Linux server might not be an adequate replacement for an Exchange server. Some projects are underway to change this. Specifically, the SuSE Linux Openexchange Server (SLOX; http://www.suse.de/en/business/products/openexchange/), Kroupware (http://kroupware.org), and the Open Source Exchange Replacement (OSER; http://www.thewybles.com/oser/) are projects intended to replace the Exchange server, while otlkon (http://otlkcon.sourceforge.net) aims to provide Linux client features. Note that these projects aren't quite drop-in replacements or aren't yet finished. Thus, Linux can't yet replace an Exchange server, but Linux can supplement one.

A Linux mail server is commonly used as an additional link in the email chain, appearing just before the Microsoft Exchange server, as shown in Figure 13-3. Placed in this way, the Linux mail server functions as a filter, similar to a firewall. Using tools designed to detect and remove spam and worms (as described in the Section 13.5), the Linux system can keep these unwanted messages from ever reaching the Exchange server. This can be preferable to filtering them out on the Exchange server because it reduces the load on the Exchange server, improving performance, particularly for entirely local actions. Another advantage of this configuration is that you can use strong packet-filter firewall rules on the Exchange server, protecting it from all outside access attempts. You can also use a Linux system to determine which of several internal servers should receive any given email; for instance, you can direct email according to the username to either of two or three servers, each of which handles only some of your site's local users.

Figure 13-3. A Linux mail server can fit into an existing Exchange network as an email filter system

A Linux mail server can fit into an existing Exchange network as an email filter system

Configuring a Linux mail server this way isn't greatly different from configuring it as a domain's only mail server. The main difference is that the system forwards all the mail it receives; it treats few or no messages as local. This is done by setting the server's mail relay options, as described in an earlier section.

Warning

A Linux mail server configured this way can protect you from spam and worms that originate outside your network. If you send your outgoing mail through the Linux mail server, it can also protect outside systems from worms that might get loose on your local network. Local mail that's handled exclusively by the Exchange server won't be examined, however, unless you configure Exchange to send even local mail via the Linux server, which increases the network load between those two systems. Thus, if a worm breaks loose on your local network, it can still spread quickly to other computers.

Using Fetchmail

A prototypical chain of mail delivery uses SMTP from the sender through to the recipient's mail server, and optionally uses POP or IMAP from the final mail server to the user's desktop system. Sometimes, though, it's desirable to use POP or IMAP earlier in the chain. For such situations, a program called Fetchmail comes to the rescue; this program enables you to pull mail from a POP or IMAP server and inject it into your local mail queue; from there it can be delivered to the same or another computer.

Before installing and using Fetchmail, you should understand precisely why it exists and how it can be used. Although it's a popular and useful tool, it's not for everybody, so attempting to use it unnecessarily can be a waste of time. If you're sure you want to use it, you must understand Fetchmail's configuration file format. Once it's configured, you can use it, which involves running it as a daemon, running it at scheduled times, or running it as part of a larger task.

The Role of Fetchmail

If you own or work for a small business, you might contract with an outside company to host your domain. This domain hosting ISP runs a server that houses your web pages and probably provides another server that can receive your domain's email. Typically, domain hosting ISPs allow you to connect to their email servers with POP or IMAP to retrieve your mail. You might be content to read your mail more or less directly like this, in which case you don't need to run any email server at all. On the other hand, you might want to perform additional processing, such as handling your own spam filtering, sorting mail for multiple users into different accounts, supporting IMAP when your domain hosting ISP provides only POP, integrating mail from multiple ISPs, or integrating mail from the Internet with your local network's mail. Individuals with small home networks often have similar needs, even if they don't have their own domains. In all these cases, what you need is a way to pull mail from the ISP's server using POP or IMAP and make it available via your own POP or IMAP server. You might even send the mail from one server to another via SMTP. This configuration is outlined in Figure 13-4. In the figure, your (example.com's) mail server uses Fetchmail to retrieve mail from the abigisp.net mail server using POP. Local computers can then retrieve the mail using IMAP.

Figure 13-4. Fetchmail enables you to use a pull mail protocol earlier in the chain than normal

Fetchmail enables you to use a pull mail protocol earlier in the chain than normal

Because pull mail protocols are initiated by the receiving end, Fetchmail has no way to know when mail is waiting for it to pick up. For this reason, Fetchmail typically polls the remote server; that is, Fetchmail checks for new mail at a regular interval. This can be done either by running Fetchmail as a daemon with a built-in polling interval or by calling Fetchmail in a regular process, such as in a cron job. Alternatively, you can call Fetchmail as part of a regular or irregular process. For instance, if you use a dial-up Internet connection, you can call Fetchmail as part of a connection script. This gives you access to all your accumulated mail as soon as you connect.

Configuring Fetchmail

The Fetchmail configuration file is located in the user's home directory and is called .fetchmailrc by default (there is no global Fetchmail configuration file). As with many files, this one uses hash marks (#) to denote comments. Aside from comments, the file begins with a number of set directives, which set various global options. Some of the more important of these options are summarized in Table 13-1.

Table 13-1. Common Fetchmail global directives

Directive name Possible options Description
postmaster Local username Username to which error messages are sent. This user may also receive failed deliveries as a last resort.
bouncemail - Tells Fetchmail to send bounce messages to the apparent sender of the message. This practice can be risky because spammers and worms usually forge the return addresses, sometimes to the addresses of legitimate but innocent individuals.
no bouncemail - Tells Fetchmail to send bounce messages to the address set with postmaster, rather than to the apparent sender.
syslog - Logs Fetchmail activities through the local syslog daemon.
logfile Filename Logs Fetchmail activities to the specified file.
daemon Time in seconds Causes Fetchmail to run in daemon mode, in which it loads but doesn't exit. Fetchmail then checks for new mail at the specified interval.


The global options are just the start of Fetchmail configuration, though. The heart of the configuration lies in the account specifications. Each begins with the keyword poll and defines everything Fetchmail needs to know about an account in order to retrieve mail from it and direct it to an appropriate local or remote address. Broadly speaking, the poll lines take the following form:

poll server.hostname server-options user-options
               

The server.hostname is, of course, the server's hostname. The server-options and user-options both consist of multiple options, which tell Fetchmail how to interact with the server and give Fetchmail information on the accounts (both the remote server's account and how Fetchmail is to deliver the mail locally). Tables Table 13-2 and Table 13-3 summarize the most common options for these two parts of the poll specifications.

Table 13-2. Common Fetchmail server options

Option name Possible values Description
proto or protocol Protocol name The name of the protocol Fetchmail should use to communicate with the server. Common values are POP3 and IMAP, but Fetchmail supports several other protocols, as well.
interface Interface name/IP address/netmask triplet An interface that must be active before Fetchmail attempts to connect to a server. For instance, ppp0/192.168.99.0/24 means that the system must have a PPP connection on the 192.168.99.0/24 network before it attempts a connection. This is most useful for dial-up users.
monitor Interface name Fetchmail monitors the specified interface (such as ppp0 or eth1) and attempts a connection only if there's been activity on that interface since the last polling interval. This option works only in daemon mode. It's most useful to prevent activity that might unnecessarily activate a dial-on-demand connection.
interval Integer Causes checks to occur only at some polling intervals. For instance, setting interval4 causes Fetchmail to check the site only every fourth polling period (as set by the global daemon value). This is useful if you want to poll multiple remote servers, but with different frequencies.


Table 13-3. Common Fetchmail user options

Option name Possible values Description
user or username Username A username on the remote server, unless the username is followed by here, in which case it's the local username to which fetched mail is delivered.
pass or password Password The password used to access the remote server.
ssl - Enables an SSL connection to the remote server. This option isn't universally supported, but if your server supports it, using SSL can improve security.
sslcert Filename The file in which an SSL certificate is stored.
sslkey Filename The file in which an SSL key is stored.
is or to Username Links the remote account information with the local account information.
here - This keyword follows a local username to identify it as local.
smtphost Hostname The hostname of the server to which Fetchmail sends mail it receives. The default is localhost, which is usually fine if you want to read mail or run your own pull mail server on the same computer.
keep - Tells Fetchmail to leave mail on the remote server after fetching it. The default is to delete fetched mail. This option is mostly useful when testing or debugging or new or changed configuration.
fetchall - Retrieves all messages on the remote server, even if Fetchmail has already fetched them. Used with keep, this can result in duplicate messages.
forcecr - Technically, email messages should have lines that end in carriage return/line feed (CR/LF) pairs; however, in practice, many messages have only the LF. Some mail servers, such as qmail, react badly to this deviation from the norm, and this option corrects this problem.
preconnect Local command A program that's run before each connection. This can bring up a network connection, run a program to delete spam from the remote server, or perform any other task you want done just before retrieving mail.


Tip

The poll specification can be quite long. Typically, it's split across two or more lines, with the second and subsequent lines indented. No line-continuation characters are required.

In addition to the options shown in Tables Table 13-2 and Table 13-3, Fetchmail accepts some more exotic options; consult its manpage for details. Certain keywords, such as and, has, options, wants, and with, are ignored by Fetchmail. These keywords can help you parse the meaning of a poll statement. Most option values can be enclosed in quote marks, but this isn't usually required unless the value contains an embedded space. Overall, although the Fetchmail poll options may seem confusing when listed in tables, in practice they're designed to be easy to parse. When strung together, they read almost like an English sentence, as shown in Example 13-4.

Example 13-4. Sample .fetchmailrc file

set postmaster "linnaeus"
set no bouncemail
set syslog

poll pop.abigisp.net with proto POP3
   user "mendel" there with password "p7Tu$ioP" is gregor here
   options fetchall forcecr preconnect "mailfilter"

poll mail.asmallisp.org with proto IMAP
   user "karl" there with password "QhI04a-23Ybz" is linnaeus here
   options forcecr smtphost mail.example.com

Warning

One of Fetchmail's weaknesses is that it requires you to store your remote email passwords in plain text in its configuration file. Be sure the configuration file has 0600 or 0400 (rw------- or r--------) permissions. If the file is readable to other users, Fetchmail refuses to act on the configuration file.

This configuration shown in Example 13-4 retrieves mail from two sources: the mendel account on pop.abigisp.net and the karl account on mail.asmallisp.org. Mail from each account is directed to a different user. The second poll statement also directs mail to a specific server (mail.example.com), which might or might not be the same server on which Fetchmail is running.

If you want to fetch mail from multiple remote accounts or for multiple users, you can use a single Fetchmail configuration, as shown in Example 13-4; by calling Fetchmail from multiple accounts, with one configuration per account; or by creating separate configurations and calling them from a single account by passing special options to Fetchmail to have it consult a nonstandard configuration file for all but one account. The account used to run Fetchmail doesn't need to be related to those that receive the local mail; for instance, linnaeus can run Fetchmail, which might deliver mail to the gregor account.

Although Fetchmail relies on a text-mode configuration file, you can use a GUI tool to help you configure Fetchmail. Type fetchmailconf in an xterm or other command-line window to run this program, which guides you through setting the Fetchmail options. This configuration tool is often installed separately from Fetchmail, though, so you may need to locate it on your distribution's installation media.

Running Fetchmail

The simplest way to run Fetchmail is to call it by name from the command line:

$ fetchmail -k
               

If all goes well, Fetchmail retrieves mail and inserts it into your local mail queue (or delivers it to another system, if you've so configured it). For testing purposes, you may want to add the -k option, which has the same effect as the keep user option. This way, if your configuration is incorrect, and Fetchmail loses your mail, you can recover it from the remote server.

For ordinary use, you should probably run Fetchmail constantly (in daemon mode) or run it periodically. To run the server in daemon mode, ensure that your .fetchmailrc file has a set daemon interval line. You can then run Fetchmail at system startup via a SysV or local startup script. Typically, you'll want to run the program as a non-root user, which you can do via the su command in your startup script:

su -c '/usr/bin/fetchmail -f /home/karl/.fetchmailrc' karl

This command runs Fetchmail as karl, when typed as root or entered into a startup script that's run as root. This command also illustrates the use of -f, which enables you to specify a configuration file.

If you want to run Fetchmail as part of a network connection procedure, such as that used to initiate a PPP connection, you can place a similar command in your network connection script. If you initiate the connection as an ordinary user, though, you might not need to use su; just call fetchmail as an ordinary user.

Another way to run Fetchmail is via a cron job. On most Linux systems, the cron process is a daemon that launches programs that should be run on a periodic basis. These cron jobs are controlled via a crontab, which is a file that's registered with the cron daemon as a way to run programs on a regular basis. Example 13-5 shows a sample crontab that runs Fetchmail on a regular basis.

Example 13-5. Sample crontab file for running Fetchmail

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=karl
HOME=/home/karl

16,36,56 7-20 * * * /usr/bin/fetchmail > /dev/null

The first few lines of the crontab file set environment variables, much as they're set in bash scripts. The final line in Example 13-5 tells cron to run the /usr/bin/fetchmail > /dev/null command at a specific time. The time format is five space-separated fields: the minute, the hour, the day of the month, the month, and the day of the week. An asterisk (*) sets a field to match any value. You can separate multiple values with commas or use a dash (-) to specify a range of values. Thus, Example 13-5 tells cron to run Fetchmail the 16th, 36th, and 56th minute of every hour between 7:16 A.M. and 20:56 (that is, 8:56 P.M.) on every day of every month. The program's output is redirected to /dev/null; if it weren't, the user who registers this cron job would receive an email with Fetchmail's output every time it runs.

Warning

Be sure that the .fetchmailrc file doesn't contain a set daemon line if you call Fetchmail via a cron job. If it does, the first time Fetchmail is run, it daemonizes and prevents subsequent runs from succeeding.

To register the crontab file, you must use the crontab. In the simplest case, you can log in as the user who you want to run Fetchmail and issue the following command:

$ crontab crontab
               

This assumes you've called the crontab file crontab; if you've called it something else, you'll need to change the filename passed to the crontab command.

Warning

If the user who's to run Fetchmail already has a crontab file, you should modify it to add the call to fetchmail. If you type crontab crontab, the new crontab file replaces the old one.

If you create a new non-login account to run Fetchmail, you can use the root account to enter a crontab file for this user. Call the crontab file something distinctive, and use the -u option to crontab to tell the program what user's crontab you're entering:

# crontab -u fmail crontab-fmail
               

This command enters the crontab-fmail file as the crontab for the fmail user. The result is that Fetchmail will run as this user, which can be a very low-privilege user. Be sure the user exists and has a home directory, or at least can read a configuration file you specify with the -f option to fetchmail in the crontab-fmail file.

Summary

Email is extremely important for most individuals and businesses today, and Linux can function as part of your network's email system. You can use a Linux SMTP server, such as sendmail or Postfix, to handle incoming mail instead of or in addition to a Microsoft Exchange server, and you can use a Linux POP or IMAP server to deliver mail to Windows, Mac OS, Linux, and other clients. One of the ways you can employ a Linux mail server is as a screening system for spam and worms. You can do this whether Linux is your sole mail server or it's just part of a larger mail solution. Finally, a tool called Fetchmail enables you to retrieve mail from a remote pull mail server and deliver it using your own pull mail server or deliver it via SMTP to another server.

Personal tools