SpamAssassin/SpamAssassin Rules

From WikiContent

< SpamAssassin
Revision as of 10:53, 7 March 2008 by Docbook2Wiki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
SpamAssassin

SpamAssassin performs its spam-checking by applying a series of tests to an email message. Most tests examine the message headers or body for patterns that are suggestive of spam; others perform Internet lookups against network-based blacklists of IP addresses or checksums of spam messages. Each positive test yields a score, and the sum of the scores is the total spam score of the message.

This chapter describes the SpamAssassin pattern-based and network-based tests: how they are written and scored, and how you can modify the score of a built-in test or write your own custom tests. This chapter also covers whitelist and blacklist rules, which can override SpamAssassin's usual determination of whether or not a message is spam.

The tests described in this chapter are all static tests—they don't change over time as SpamAssassin analyzes messages. Chapter 4 explains learning tests, which use information from messages seen in the past to improve decisions in the future.

Contents

The Anatomy of a Test

Most SpamAssassin tests consist of the same basic components:

  • A test name, consisting of up to 22 uppercase letters, numbers, or underscores. Names that begin T_ refer to rules in testing.
  • A more verbose description of the test, which is used in the reports generated by SpamAssassin. Typically, descriptions are up to 50 characters long.
  • An indication of where to look. Tests can be applied to the message headers only, the message body only, uniform resource identifiers (URIs) in the message body, or the complete message. When testing the message body, the body can be analyzed in its raw state, after MIME-decoding the text, or after MIME-decoding, stripping of HTML, and removal of all line breaks.
  • A description of what to look for. Tests can specify a header to check for existence, a Perl regular expression pattern to match, a DNS-based blacklist to query, or a SpamAssassin function to evaluate.
  • Optional test flags that control the conditions under which the test is applied or other exceptional features.
  • A score or scores for the test. Tests can have a single score that is always used, or they can have separate scores for messages that test positive under each of four conditions:
    • When the Bayesian classifier and network tests are not in use
    • When the Bayesian classifier is not in use, but network tests are
    • When the Bayesian classifier is in use, but network tests are not
    • When the Bayesian classifier and network tests are both in use

Example 3-1 shows the complete definition of a test that matches when a message's From address begins with at least two numbers. This test is defined in the file /usr/share/spamassassin/20_head_tests.cf (although its score appears in the 50_scores.cf file).

Example 3-1. A test definition and score

header FROM_STARTS_WITH_NUMS    From =~ /^\d\d/
describe FROM_STARTS_WITH_NUMS  From: starts with nums

score FROM_STARTS_WITH_NUMS     0.390 1.574 1.044 0.579

How does this test work? The header directive defines it as a test that will be applied to the message headers and gives the test name (FROM_STARTS_WITH_NUMS) and the test itself, a match of the From header against the regular expression /^\d\d/. That regular expression denotes a string that begins with two digits.

Tip

For information about how to read and write regular expressions, see the Perl manual page perlre, or Jeffrey Friedl's book Mastering Regular Expressions (O'Reilly).

The describe directive provides a human-readable description of the test that SpamAssassin will insert in reports when the test matches. The score directive determines how many points SpamAssassin will add to the spam score of a message if the test matches. Higher scores mean that a message that matches the test is more likely to be spam. In this example, SpamAssassin will add 0.39 points to the spam score of a matching message if network and Bayesian tests are not in use, 1.574 points if network tests are in use but Bayesian tests are not, 1.044 points if Bayesian tests are in use but network tests are not, and 0.579 points if both network and Bayesian tests are in use.

The tests distributed with SpamAssassin are typically stored in files in /usr/share/spamassassin. Tests are stored in a set of ruleset files based on the type of test being performed, and scores for all tests are stored together in one file. These tests are discussed in detail later in this chapter. Following are some other examples of test definitions from the distributed tests, along with their scores.

Testing for a To, From, or Cc header that mentions friend@public.com (this test is distributed disabled):

header FRIEND_PUBLIC       ALL =~ /^(?:to|cc|from):.*friend\@public\.com/im
describe FRIEND_PUBLIC     sent from or to friend@public.com
score FRIEND_PUBLIC        0

Testing for the existence of the X-PMFLAGS header:

header X_PMFLAGS_PRESENT        exists:X-PMFLAGS
describe X_PMFLAGS_PRESENT      Message has X-PMFLAGS header
score X_PMFLAGS_PRESENT         2.900 2.800 2.800 2.700

Testing for long lines of hexadecimal code in the message body:

body LARGE_HEX                  /[0-9a-fA-F]{70,}/
describe LARGE_HEX              Contains a large block of hexadecimal code
score LARGE_HEX                 0.633 1.595 1.193 1.160

Testing for a Subject header in all capital letters, by evaluating a SpamAssassin function:

header SUBJ_ALL_CAPS            eval:subject_is_all_caps( )
describe SUBJ_ALL_CAPS          Subject is all capitals
score SUBJ_ALL_CAPS             0.550 0.567 0 0

Testing for a message that includes HTML to open a new window with JavaScript (disabled by default):

body HTML_WIN_OPEN              eval:html_test('window_open')
describe HTML_WIN_OPEN          Javascript to open a new window
score HTML_WIN_OPEN             0

Testing for an HTTP (Hypertext Transfer Protocol) URI anywhere in the message that uses a numeric IP address (e.g., http:// 3502894884):

uri NUMERIC_HTTP_ADDR           /^https?\:\/\/\d{7,}/is
describe NUMERIC_HTTP_ADDR      Uses a numeric IP address in URL
score NUMERIC_HTTP_ADDR         2.899 2.800 2.696 0.989

Modifying the Score of a Test

You may find some tests more indicative of spam than SpamAssassin does by default. If SpamAssassin already provides a test that you value but doesn't assign it a high enough score (higher scores are more indicative of spam), you can easily modify the score of the test. Similarly, if one of SpamAssassin's tests is giving you too many false positives, you can reduce its score or disable the test entirely by setting its score to 0. SpamAssassin will not attempt to run a test with a score of 0.

Modifying Scores Systemwide

Make systemwide score adjustments in the systemwide configuration file, typically /etc/mail/spamassassin/local.cf. To modify the score of a test, you must first determine its test name, either by reading the ruleset files or by examining the spam report from a message. To get a spam report on a message that doesn't score high enough for SpamAssassin to generate a report, you can use spamassassin --test-mode, as described in Chapter 2.

To change the score of a test, simply add a new score directive to the configuration file, like this:

score HTML_WIN_OPEN 2

This will enable the HTML_WIN_OPEN test and add two points to the score of messages that test positive on this test.

You can use the same approach to modify the descriptions of tests by adding new describe directives. For example, the default description for the HOT_NASTY test is "Possible porn - Hot, Nasty, Wild, Young". To shorten that to "Possible porn", add this directive to the configuration file:

describe HOT_NASTY Possible porn

Modifying Scores on a Per-User Basis

Users can use the score directive in per-user preference files to change the scoring of a test for an individual user. To do so, a user edits the .spamassassin/user_prefs file in her home directory and adds score directives. This approach to customizing scores is the simplest, but it requires users to have accounts on the system and access to files in their accounts.

Storing Scores in an SQL Database

When users do not have accounts or shell access (e.g., on a system that is an IMAP or webmail server), per-user scores can be stored in an SQL database and spamd can be configured to look up scores in the database. To store scores in SQL, you must install the DBI Perl module and an appropriate driver module for your SQL database server. Common choices are DBD-mysql (for the MySQL server), DBD-Pg (for the PostgreSQL server), and DBD-ODBC (for connection to an ODBC-compliant server).[1]

You should create a database and a user with privileges to access it. You must then create a table in the database to store the user scores. The SpamAssassin source code includes a schema for a MySQL table in the sql subdirectory, which is shown in Example 3-2. SpamAssassin 3.0 also includes a schema for a PostgreSQL table.

Example 3-2. A MySQL table for user scores

CREATE TABLE userpref (
  username varchar(100) NOT NULL,
  preference varchar(30) NOT NULL,
  value varchar(100) NOT NULL,
  prefid int(11) NOT NULL auto_increment,
  PRIMARY KEY (prefid),
  INDEX (username)
) TYPE=MyISAM;

You can use a different name for the table. The name given in Example 3-2 is the default, however, and using it will require the least amount of SpamAssassin configuration effort.

Each row in this table specifies the score for a single test for an individual user. SpamAssassin expects the columns to contain the following information:

username
Gives the username or email address of the user (the latter is more useful in virtual hosting environments). The special username @GLOBAL can be used to define global values in SQL that will be applied to all users.
preference
Gives the name of the test to modify the score of. The column can also be used with other directives (e.g., required_hits, auto_report_threshold, and the whitelisting and blacklisting directives described later in this chapter) but cannot define new rules or modify administrative settings.
value
Gives the new score for the test or a new value for one of the other directives (e.g., number of hits required to call a message spam or an email address to add to the whitelist). SpamAssassin does not provide any tools for adding data to these tables.

The prefid column and the PRIMARY KEY and INDEX clauses are useful but not necessary. prefid defines a primary key for the table, and an index is built on the username column to speed up queries.

To configure SQL support for user scores, set the following configuration parameters in your systemwide configuration file (local.cf):


user_scores_dsn DSN
This directive defines the data source name (DSN) for the SQL database. It tells spamd how it will connect to the database server. A typical DSN, for the Perl DBI module, is written like this:

DBI:databasetype:databasename:hostname:port
                     

For example, to use a MySQL database named sascores running on a database server on the SpamAssassin host, the DSN would read:

DBI:mysql:sascores:localhost:3306

If the server were running PostgreSQL, the DSN would read:

dbi:Pg:dbname=sascores;host=localhost;port=5432;

;user_scores_sql_username username

This directive defines the username that will be used to connect to the database server. This user must have permission to issue SELECT queries against the table but need not be permitted to modify the data or database structure.
user_scores_sql_password password
This directive defines the password associated with the username that will be used to connect to the server.
user_scores_sql_table tablename
This directive defines the name of the table that contains user preferences. The default tablename is "userpref".

user_scores_sql_custom_query query (SpamAssassin 3.0)
This directive specifies the SQL query that SpamAssassin will use to look up user preferences. The query must be specified on a single (long) line in the configuration file. The default query is:

SELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_ OR username = '@GLOBAL'
ORDER BY username ASC

This is read as "return the preference and value fields from the configured table (_TABLE_) for those rows with the specified username (_USERNAME_) or with the @GLOBAL username, in ascending lexicographic order." Because SpamAssassin will use the value of each matching preference it encounters in order, and because @GLOBAL sorts before all usernames, user-specific preferences will effectively override global preferences.

You can use this directive to construct your own custom queries. Custom queries must also return the preference and value columns (in that order). Queries may use the special symbols _TABLE_ (replaced by the name of the table where user preferences are stored), _USERNAME_ (replaced by the user's username), _MAILBOX_ (replaced by the portion of the username before an at sign [@] or the whole username if there is no at sign), and _DOMAIN_ (replaced by the portion of the username after an at sign or a null value if there is none). The manpage for Mail::SpamAssassin::Conf provides a few interesting examples of default queries. To support individual, domain, and global settings, add rows to the table with usernames of @~ domain (which will sort after @GLOBAL but before real usernames) and use this query:

SELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_ OR username = '@GLOBAL' 
OR username = '@~'||_DOMAIN_
ORDER BY username ASC

If you prefer to have some global preferences that cannot be overridden by users and others that can, you can add rows to the table for the unchangeable preferences with username ~GLOBAL (which will sort after all usernames) and rows for the changeable preferences with username @GLOBAL and use this query:

SELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_ OR username = '@GLOBAL' 
OR username = '~GLOBAL'
ORDER BY username ASC

Finally, you'll need to start spamd with the --nouser-config command-line option and either the --sql-config or --setuid-with-sql option to enable SQL-based configuration (and disable the use of ~/.spamassassin/user_prefs files, which cannot be used by spamd together with SQL). If spamd runs as a non-root user, or if your users don't have home directories, use --sql-config; if spamd runs as root and users have home directories, using --setuid-with-sql will enable spamd's usual practice of changing uid to the user running spamc so that it can access the user's autowhitelist files.

Storing Scores in an LDAP Database

Another way to store per-user preferences in SpamAssassin 3.0 is in an LDAP (Lightweight Directory Access Protocol) database. This approach may appeal particularly to sites that already store their user account configuration in LDAP. To store scores in LDAP, you must install the Net::LDAP and URI Perl modules.

LDAP objects (like those that represent users) and their attributes (such as username, password, email address, etc.) are defined by one or more LDAP schemas. To add SpamAssassin preferences to your users, extend the objectClass that represents a user to allow an additional, optional spamassassin attribute, which you should define like this:

  # spamassassin
  # see http://SpamAssassin.org/ .
  attributetype ( 2.16.840.1.113730.3.1.217
          NAME 'spamassassin'
          DESC 'SpamAssassin user preferences settings'
          EQUALITY caseExactMatch
          SYNTAX 1.3.6.1.4.1.1466.115.121.1.15 )

The attribute SYNTAX must be multivalued (as in the example, which specifies the DirectoryString syntax with object identifier (OID) 1.3.6.1.4.1.1466.115.121.1.15), because a user object will have multiple spamassassin attributes, one for each preference setting.

The attributes themselves should be stored in the database. A spamassassin LDAP attribute should be set to the name of a SpamAssassin configuration directive followed by the value for the directive, separated by a space. SpamAssassin 3.0 includes an example of what such user definitions might look like in LDIF (LDAP Interchange Format) format. The spamassassin attribute added to this user's LDAP entry is emphasized:

dn: cn=Curley Anderson,ou=MemberGroupB,o=stooges
ou: MemberGroupB
o: stooges
cn: Curley Anderson
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
mail: CAnderson@isp.com
givenname: Curley
sn: Anderson
uid: curley
initials: Joe
homePostalAddress: 14 Cherry Ln.$Plano TX 78888
postalAddress: 15 Fitzhugh Ave.
spamassassin: add_header all Foo LDAP read
l: Dallas
st: TX
postalcode: 76888
pager: 800-555-1319
homePhone: 800-555-1313
telephoneNumber: (800)555-1214
mobile: 800-555-1318
title: Developemnt Engineer
facsimileTelephoneNumber: 800-555-3318
userPassword: curleysecret

To configure LDAP support for user scores, set the following configuration parameters in your systemwide configuration file (local.cf):


user_scores_dsn DSN
Defines the data source name for the LDAP database. It tells spamd how it will connect to the LDAP server. LDAP DSNs are specified as URLs according to RFC 2255, like this:

ldap://host:port/basedn?attr?scope?filter

For example, to use the LDAP server on the SpamAssassin host to search for objects under the base DN of dc=example,dc=com and to return the spamassassin attributes for those in which the uid attribute matches the username that SpamAssassin is running for, the DSN would be:

ldap://localhost:389/dc=example,dc=com?spamassassin?sub?uid=_  _USERNAME_  _

;user_scores_ldap_username bind_dn

Provides the DN that SpamAssassin should use to bind to the LDAP server. This DN must have sufficient privileges to perform the query defined in the DSN.
user_scores_ldap_password password
Provides the password that SpamAssassin should use to authenticate itself when binding to the LDAP server with the specified bind_dn.

Finally, you'll need to start spamd with the --nouser-config command-line option and either the --ldap-config or --setuid-with-ldap option to enable LDAP-based configuration (and disable the use of ~/.spamassassin/user_prefs files, which cannot be used by spamd together with LDAP). If spamd runs as a non-root user, or if your users don't have home directories, use --ldap-config; if spamd runs as root and users have home directories, using --setuid-with-ldap will enable spamd's usual practice of changing uid to the user running spamc so that it can access the user's autowhitelist files.

Writing Your Own Tests

When none of the existing tests does what you'd like, you can write a custom test of your own. Custom tests are just like the distributed tests, except that you install them in the systemwide configuration file or in a per-user preference file.

Tip

Users can write their own tests in their per-user preference files, but for security reasons these tests will not be used when spamd is performing spam-checking, unless the allow_user_rules option is set to 1 in the systemwide configuration. However, setting this option is dangerous because spamd runs as root and a malicious or inexperienced user can construct a custom test that causes the system to hang or to invoke an arbitrary command as nobody or as spamd's uid. Users who want their own tests on a system that uses spamd should reinvoke the spamassassin script on their incoming mail (probably in their .procmailrc). Chapter 2 illustrates this approach.

The first step in writing a custom test is to choose a symbolic test name and write a meaningful test description with the describe directive. For now, do not begin any of your names with a double underscore (_ _). Test names that begin with two underscores are not listed in test hit reports, nor are they added to the spam score on their own; such names are used for creating sets of subtests that should be applied in combination. SpamAssassin calls these combinations meta tests, and they are discussed later in this section.

Second, determine what part of the message you wish to test. Table 3-1 summarizes the directives used to test different portions of a message. Each is covered in greater detail in the following sections.

Table 3-1. Message portions and associated test directives

Message part Directive Possible tests
Headers header TESTNAME Match a regexpDon't match a regexpExistsEvaluate Perl codeCheck Received headers against DNSBL
Message subject and text of message body, decoding all textual MIME parts, with HTML tags and line breaks removed body TESTNAME Match a regexpEvaluate Perl code
Text of message body, decoding all textual MIME parts, with HTML tags and line breaks retained rawbody TESTNAME Match a regexpEvaluate Perl code
Undecoded message body including all MIME parts full TESTNAME Match a regexpEvaluate Perl code
URIs in the message body uri TESTNAME Match a regexp
URIs in the message body uridnsbl TESTNAME (SpamAssassin 3.0) Check for address in a DNS-based blacklist


Third, decide if your test requires any special test flags. Test flags are used to inform SpamAssassin that your test may apply only under certain conditions or may do something unusual. Use the tflags TESTNAME flaglist directive to indicate test flags. The flaglist is a space-separated list of flags. Table 3-2 lists the available flags in SpamAssassin and their effects.

Table 3-2. Test flags

Flag Meaning
net A network-based test that will not be run when SpamAssassin is directed to run local tests only
learn A test that requires training before use (e.g., the Bayesian tests)
userconf A test that requires user configuration before use (e.g., a test that expects the user to provide a list of addresses)
nice A test that will be given a negative score
noautolearn (Spamassassin 3.0) A test that will not be applied in the spam score when determining whether the message should be automatically learned as spam or non-spam


For example, the RCVD_IN_BL_SPAMCOP_NET test, which checks the message's Received headers against the DNS-based blacklist at bl.spamcop.net is defined in 20_dnsbl_tests.cf like this:[2]

header   RCVD_IN_BL_SPAMCOP_NET eval:check_rbl_txt('spamcop', 'bl.spamcop.net.')
describe RCVD_IN_BL_SPAMCOP_NET Received via a relay in bl.spamcop.net
tflags   RCVD_IN_BL_SPAMCOP_NET net

Finally, after adding or modifying a test, you should run spamassassin --lint 'to check your new rules for correct syntax. This command will attempt to parse all of the rules and configuration files in the ruleset directory and systemwide configuration directory. It exits quietly if no errors are found.

Header Tests

Use the header directive to define a header test. Header tests can test for the existence of a header or check to see if a header matches (or fails to match) a regular expression.

To check for the existence of a header, use the following syntax:

header TESTNAME exists:headername
            

Regular expression tests can be applied to any single header in a message, both the To and Cc headers, all Message-Id headers, or all headers. Use the following form to match a header to a Perl regular expression:

header TESTNAME headername =~ /regexp/modifiers
            

Use this next syntax to test whether a header does not match a regular expression:

header TESTNAME headername !~ /regexp/modifiers
            

In these tests, the headername can be the name of a single header, or can be ToCc (to match in the To or Cc header), MESSAGEID (to match in any Message-Id header), or ALL (to match in any header). SpamAssassin 3.0 also supports headername EnvelopeFrom to match against the address supplied in the SMTP MAIL FROM command if the MTA provides this information to SpamAssassin.

A header that does not exist will not match any regular expression. To handle the possibility of a nonexistent header, you can add an optional [if-unset: STRING ] after the regular expression and modifiers, and STRING will be tested against the regular expression if the header does not exist. For example, to look for a Reply-To header that either contains @localhost or is missing, you could use this rule:

header LOCAL_OR_NO_REPLY reply-to =~ /@localhost/ [if-unset: @localhost]

Many of the methods available in the Mail::SpamAssassin::EvalTests module test headers. This module is not documented, but you can learn about its methods by reading the rules distributed with SpamAssassin. For example, the subject_is_all_caps( ) method matches when the Subject header contains all capital letters. This test is the basis of the SUBJ_ALL_CAPS rule distributed with SpamAssassin:

header SUBJ_ALL_CAPS      eval:subject_is_all_caps( )

Configurable header tests (SpamAssassin 3.0)

Some of the header tests in SpamAssassin 3.0 that use Mail::SpamAssassin::EvalTests methods have configurable parameters that control their operation. These parameters should be defined in sitewide or user configuration files.

The check_for_from_dns( ) method performs a DNS lookup on the address in the message's Reply-To or From header to ensure that an MX record listing a host willing to receive mail for the message sender's host exists. Because DNS lookups can be slow, two configuration file options, check_mx_attempts and check_mx_delay are provided so you can adjust these lookups. Set check_mx_attempts to the number of lookup attempts you are willing to have SpamAssassin make (the default is 2). Set check_mx_delay to the number of seconds to wait between attempts in case the domain name server is temporarily down (the default is 5).

The check_hashcash_value( ) and check_hashcash_double_spend( ) methods implement Hashcash verification (http://www.hashcash.org). If a message includes an X-Hashcash header, SpamAssassin can quickly verify that the sender spent the required processing time to produce a valid header and reduces the message's spam score in proportion to how difficult it was for the sender to produce the header. To control SpamAssassin's use of Hashcash, define the following configuration variables:

use_hashcash
If this variable is set to 1 (the default), Hashcash headers in messages will be checked. To disable Hashcash-checking, set this variable to 0.

hashcash_accept address(es)
In order for SpamAssassin to perform a Hashcash check, it must know all of the valid addresses that could receive mail with Hashcash headers. Set this variable to provide those addresses.

You can use multiple hashcash_accept directives or multiple addresses in a single directive to list several addresses. You can also use an asterisk (*) as a wildcard for zero or more characters and the question mark (?) as a wildcard for zero or one character, much as you would to specify filename patterns in a shell. Finally, you can use %u to represent the current user's username in a sitewide configuration file. For example, a sitewide configuration file for users at example.com might include:

hashcash_accept %u@example.com %u@*.example.com

;hashcash_doublespend_path /path/to/file

Set this variable to the path at which SpamAssassin will create and maintain a (Berkeley DB format) database of previously seen Hashcash headers to prevent a sender from reusing a header. The default file is ~/.spamassassin/hashcash_seen. For a shared sitewide database, the user SpamAssassin runs as must have permission to write to this file and its directory.
hashcash_doublespend_file_mode mode
The file mode, in octal, for the Hashcash double-spend database. The default file mode is 0700. The file mode should include execute bits so that SpamAssassin can create directories, if necessary; i.e., use 0700 rather than 0600.

check_rbl( )

A set of methods that can be the basis for new tests are the check_rbl( ), check_rbl_txt( ), and check_rbl_sub( ) methods. These methods extract IP addresses from a message's Received headers, discard those that are known to be reserved addresses or on trusted networks, and query a DNS-based blacklist for each address. If any of the addresses are listed in the blacklist, the test matches. Rules using these methods are written like other eval rules:

header A_NEW_BLACKLIST    eval:check_rbl('nasties','new.blacklist.zone')

Call check_rbl( ) with two arguments. The first argument is the zone ID, a string that's used to identify the blacklist. It's primarily useful when you're querying a blacklist that's composed of many different lists, and you later want to evaluate the query result by which sublists the addresses were on (this topic is discussed later in this chapter).

If you append -notfirsthop to the name of the zone ID, the originating IP address will be excluded from RBL lookups unless it is the only IP address. This is useful when querying blacklists of dialup or DSL (Digital Subscriber Line) hosts that are expected to relay all their email through an ISP's mail server. If new.blacklist.zone was this kind of blacklist, you might have written the test like this:

header A_NEW_BLACKLIST    eval:check_rbl('nasties-not-firsthop','new.blacklist.zone')

Similarly, you can append -firsttrusted to check the IP address that appears in the Received header that was added by the most remote trusted server (IP addresses in Received headers added by more remote relays cannot be trusted). This is useful for querying a DNS-based whitelist to determine whether the server that first relayed the email to a trusted server appears on the whitelist. By appending -untrusted, you will check only the untrusted IP addresses (those more remote than the most remote trusted server). Here's a definition for a test of a DNS-based whitelist:

header A_NEW_WHITELIST    eval:check_rbl('friends-firsttrusted','new.whitelist.zone')
tflags A_NEW_WHITELIST    nice

(Remember, as Table 3-2 points out, when defining a test that will lower the spam score, you must set the nice test flag.)

The second argument is the DNS zone for the blacklist. SpamAssassin checks the blacklist by performing a DNS query for a hostname in this zone. SpamAssassin determines the hostname by reversing the IP address that it's trying to check (e.g., 128.0.10.0 becomes 0.10.0.128) and prepending it to the zone name (e.g., creating 0.10.0.128.new.blacklist.zone). It then issues a query for a DNS A record associated with that hostname. Typically, if an address is blacklisted, the DNS query will be successful—it will return an IP address (usually 127.0.0.1). If the address is not on the blacklist, the DNS query will fail (returning an NXDOMAIN response).

check_rbl_txt( )

Some blacklists are based on DNS TXT records instead of DNS A records. (Blacklist operators should indicate which kind of lookup is appropriate for their blacklist.) Use the check_rbl_txt( ) method to perform lookups using a blacklist based on TXT records. check_rbl_txt( ) accepts the same arguments as check_rbl( ) and works analogously. SpamAssassin reverses the IP address that it's trying to check (e.g., 128.0.10.0 becomes 0.10.0.128) and prepends it to the zone name (e.g., creating 0.10.0.128.new.blacklist.zone). It then issues a query for a DNS TXT record associated with that hostname. If the address is blacklisted, the TXT query will return a string explaining why the address is blacklisted. If the address is not on the blacklist, the DNS query will fail (returning an NXDOMAIN response).

check_rbl_sub( )

Some DNSBLs are aggregations of many different blacklists. These DNSBLs typically return different IP addresses in response to a successful A lookup to indicate on which sublist(s) the blacklisted address appears (e.g., the query returns 127.0.0.1 for addresses on sublist 1, 127.0.0.2 for addresses on sublist 2, etc.).

Use the check_rbl_sub( ) method to query a combined DNSBL and determine if the IP address is on a specific sublist. This method also takes two arguments: the first is a zone ID, and the second indicates which response is associated with the desired sublist. For example, if the new.blacklist.zone blacklist is composed of sublists that return 127.0.0.1 and 127.0.0.2, you could check IP addresses against only the second sublist:

header A_NEW_BLACKLIST    eval:check_rbl('nasties','new.blacklist.zone')
header NEW_BLACKLIST_2    eval:check_rbl_sub('nasties','127.0.0.2')

Less commonly, composite lists may return a single A record whose IP address is to be interpreted as a bitmask of matching sublists. To check a sublist in this case, provide a bitmask (as a positive decimal number) as the second argument to check_rbl_sub( ).

Note that you must have a rule that uses check_rbl( ) or check_rbl_txt( ) to associate a zone ID string with the blacklist in order to check the result against a sublist.

Body Tests

The body, rawbody, and full directives define tests on the body of an email message. Two basic kinds of tests are provided. Message bodies can be tested against a regular expression pattern, and message bodies can be submitted to an eval test defined in Mail::SpamAssassin::Evaltests.

The body directive defines a test to be applied to the text of a message, as it would be likely to appear to a person reading the message in a text-based mail client. The Subject header is considered to be the first paragraph of the message body. All textual MIME components of the message are decoded, and HTML tags are removed. The message is reformatted into paragraphs (text separated by multiple newlines), and newlines within paragraphs are removed. The test is then applied to each message paragraph. Here's an example of a body test distributed with SpamAssassin that matches if the word "remove" appears in quotes in the body:

body REMOVE_IN_QUOTES           /\"remove\"/i

The rawbody directive defines a test to be applied to the text of a message, as it would be likely to appear to a person reading the message in an HTML-based mail client. The Subject header is not included. All textual MIME components of the message are decoded, and the message is split into lines based on the line breaks in the message. The test is then applied to each message line. Here's an example of a rawbody'test distributed with SpamAssassin that's designed to find a JavaScript statement that's common in spam:

rawbody HIDE_WIN_STATUS          /<[^>]+onMouseOver=[^>]+window\.status=/I

Note that this test could not be written as a body test because this JavaScript appears inside an HTML tag.

The full directive defines a test to be applied to the full text of a message. All headers are included, along with all textual MIME components of the message body, but no decoding is performed. The message is split into lines based on the line breaks in the message, and the test is then applied to each header and message line. SpamAssassin does not distribute any full tests that match regular expressions; it reserves full for eval tests that must submit the raw message to external spam clearinghouses (which are discussed later in this chapter).

Warning

Body tests are powerful but slow. Be especially careful when defining regular expressions to test message bodies, as these expressions will be applied to large amounts of text. Consult Jeffrey Friedl's book Mastering Regular Expressions (O'Reilly) for important tips on optimizing regular expression processing.

URI Tests

The uri directive defines a test on all URIs that appear in an email message. SpamAssassin creates a list of http, https, ftp, mailto, javascript, and file URIs and transforms bare hostnames starting with www or ftp into appropriate URIs. The test is applied to each URI in the message.

URIs can be matched against a regular expression pattern. Here's an example of a distributed URI test that checks for a mailto URI with the string "remove" in the address portion:

uri MAILTO_TO_REMOVE            /^mailto:.*?remove/is

SpamAssassin 3.0 includes a plug-in called Mail::SpamAssassin::Plugin::URIDNSBL. When loaded, this plug-in enables the uridnsbl directive, which takes each URI in the message, extracts the name of the host in the URI, looks up its IP address in DNS, and then checks the IP address against a specified DNSBL. These tests catch spam that is relayed through innocent (or temporary) mail servers but that advertise web sites on spammer servers. Here's a portion of SpamAssassin 3.0's 25_rules.cf file that defines a uridnsbl test called URIBL_SBLXBL:

loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
...
uridnsbl  URIBL_SBLXBL    sbl-xbl.spamhaus.org.   TXT
header    URIBL_SBLXBL    eval:check_uridnsbl('URIBL_SBLXBL')
describe  URIBL_SBLXBL    Contains a URL listed in the SBL/XBL blocklist

Meta Tests

A meta test is a test that combines the results of several other tests using Boolean logic. For example, a meta test might be positive if either of two subtests are positive, or might specify that both subtests must be positive. A meta test can combine several tests using Boolean operators for and (&&), or (||), and not (!), along with parentheses to modify the precedence in the expression.

When using meta tests, you will often want some or all of the subtests to contribute only to the meta test and not to be separately scored. To achieve this effect, give the subtests names that begin with two underscores. This prevents SpamAssassin from scoring them separately. You can then assign a single score to the meta test. Because non-scoring subtests will never be listed in a SpamAssassin report, you need not include a describe directive for these tests.

Example 3-3 shows the CLICK_BELOW meta test in SpamAssassin.

Example 3-3. A meta test and its subtests

body CLICK_BELOW_CAPS      /CLICK\s.{0,30}(?:HERE|BELOW)/s
describe CLICK_BELOW_CAPS  Asks you to click below (in capital letters)

body _  _CLICK_BELOW         /click\s.{0,30}(?:here|below)/is

meta CLICK_BELOW           (_  _CLICK_BELOW && !CLICK_BELOW_CAPS)
describe CLICK_BELOW       Asks you to click below

The CLICK_BELOW_CAPS test is standard body test that is positive if the words "CLICK BELOW" or "CLICK HERE" appear in the message in uppercase. Although it is a standard test that is used and scored on its own, SpamAssassin also uses it as a subtest in a meta test. The _ _CLICK_BELOW test is a nonscoring subtest that is positive if the same phrases appear in any combination of upper- and lowercase letters. The CLICK_BELOW meta test is positive when _ _CLICK_BELOW is positive and CLICK_BELOW_CAPS is not positive—that is, when the phrase appears in anything except all uppercase. Typically, a mixed or lowercase occurrence is assigned a lower score than the uppercase version.

In addition to using Boolean logic operators, it's also possible to use arithmetic operators (+, -, *, /) and comparisons (>, >=, <, <=, !=, =). When you combine tests with arithmetic operators, the values of subtests are 1 if they are positive and 0 if they are negative. One such meta test in SpamAssassin is MULTI_FORGED, which counts the number of positive tests for different kinds of Received header forgery and is positive when two or more forgeries appear in the same message. This test is shown in Example 3-4.

Example 3-4. The MULTI_FORGED meta test

meta MULTI_FORGED     ((FORGED_AOL_RCVD + FORGED_HOTMAIL_RCVD + FORGED_EUDORAMAIL_RCVD + 
FORGED_YAHOO_RCVD + FORGED_JUNO_RCVD + FORGED_GW05_RCVD) > 1)

The Built-in Tests

SpamAssassin is distributed with over 700 test rules defined for English-language spam. SpamAssassin 2.63 includes another 2,900 rules for spam in other languages. (Language support in SpamAssassin 3.0 is currently available only for French and German, but language support is likely to increase as SpamAssassin gets into wider release.) Reading the rules distributed with SpamAssassin is an excellent way to learn to write your own rules.

SpamAssassin's rules are defined in a set of files typically installed in /usr/share/spamassassin:

10_misc.cf
The 10_misc.cf file defines templates for the spam report that SpamAssassin attaches to spam messages, definitions of headers that SpamAssassin adds to messages, and default settings for the most common configuration options. This file is described in more detail later in this chapter.
10_plugins.cf (SpamAssassin 3.0)
This file provides a convenient place to load SpamAssassin plug-in modules with the loadplugin directive. Plug-ins extend SpamAssassin's features.
20_fake_helo_tests.cf
This file defines a set of rules used to test for forged HELO hostnames. This file is also described in more detail later in this chapter.
20_body_tests.cf
This file defines most tests against message bodies, spam clearinghouses, message languages, and message locales. It's described in more detail later.
20_dnsbl_tests.cf
This file defines tests against many different DNS blacklists, using the check_rbl( ), check_rbl_sub( ), and check_rbl_txt( ) eval tests described earlier in this chapter. These blacklists include NJABL (http://www.dnsbl.njabl.org/), SORBS (http://www.dnsbl.sorbs.net/), OPM (http://opm.blitzed.org/), Spamhaus (http://www.spamhaus.org/sbl/), DSBL (http://dsbl.org), Spamcop (http://www.spamcop.net/bl.shtml), MAPS (http://www.mail-abuse.org), and several others.
20_ratware.cf and 20_anti_ratware.cf
The 20_ratware.cf file contains tests that look for tell-tale signs of specialized mail programs known to be used by spammers (ratware or spamware). Most of them are tests of message headers. The 20_anti_ratware.cf file is designed to contain tests that look for signs of non-spam mail programs that might be mistaken for spamware, but it doesn't contain any active tests as of SpamAssassin 3.0.
20_head_tests.cf
This file contains most of the tests that SpamAssassin performs against message headers. This includes tests for blacklisted and whitelisted addresses in the From and To headers (discussed in greater detail in Chapter 4).
20_porn.cf (all SpamAssassin versions) and 20_drugs.cf (SpamAssassin 3.0)
These files contain body tests that look for common indicators of pornographic spam and online pharmacy spam, respectively.
20_phrases.cf
This file contains body tests that look for common phrases that appear in spam. Most of them are either instructions for how you can be removed from the mailing list or claims that the message conforms to a bill that putatively regulates unsolicited email.
20_uri_tests.cf
This file contains most of the tests that SpamAssassin performs against URIs that appear in messages.
20_compensate.cf
Tests in this file are intended to compensate for common false positives in header tests and are "nice" tests (with negative spam scores).
20_html_tests.cf
This file contains body tests that target messages that contain HTML markup. Certain types of markup are very commonly seen in spam, and several of these tests make for interesting reading.
20_meta_tests.cf
This file contains meta tests. Meta tests are tests that combine other tests, and are described earlier in this chapter.
23_bayes.cf
This file contains tests that act on the results of the Bayesian classifier. The Bayesian system and these tests are described in greater detail in Chapter 5.
25_head_tests_es.cf, 25_body_tests_es.cf, 25_head_tests_pl.cf, 25_body_tests_pl.cf (SpamAssassin 2.6x)
These files contain header and body tests for Spanish (es) and Polish (pl) messages.
25_uribl.cf (SpamAssassin 3.0)
This file loads the URIDNSBL plug-in and defines URI tests against DNS blacklists.
30_text_*.cf (de,es,fr,it,pl,sk)
These files don't define any new tests but provide translations of test descriptions and report templates into different languages, such as German (de), Spanish (es), French (fr), Italian (it), Polish (pl), and Slovak (sk). SpamAssassin 3.0 includes only German and French tests at the time of this writing.
50_scores.cf
This file defines the scores associated with all of the tests defined in the other files. The scores are separated into a single file because they are generated by an algorithm that applies each test to a large corpus of spam and non-spam messages and adjusts the scores to minimize false positives and false negatives.
60_whitelist.cf
The rules in this file set up default whitelists for several large well-known addresses and companies, such as Amazon.com.

Because these files are overwritten whenever SpamAssassin is upgraded, they should not be changed. All local rules or changes to the scoring of distributed rules should be performed in the systemwide configuration file (or in per-user preference files) rather than in these files. Reading these files, however, provides the most information about how SpamAssassin rules are designed.

The following sections describe some of the more important rule files in greater detail.

10_misc.cf

The 10_misc.cf file defines special rules that are not spam tests. These include templates for the spam report that SpamAssassin attaches to spam messages, definitions of headers that SpamAssassin adds to messages, and default settings for the most common configuration options (such as those described in Chapter 2).

Templates are defined with the report, unsafe_report, and spamtrap directives, and the corresponding utility directives clear_report_template, clear_unsafe_report_template, and clear_spamtrap_template. Use the report template to design the report that SpamAssassin attaches to spam messages. Use the unsafe_report template to design the report that SpamAssassin attaches to messages that contain potentially executable code. Use the spamtrap template to design the message that SpamAssassin sends back to senders who email a spam trap address that calls the spamassassin script with the --report and --warning-from options (spam-reporting is discussed in Chapter 2).

Each time it encounters a template directive, SpamAssassin appends new text to the template. Accordingly, to ensure that you're starting with a clean slate when you define a new template, you must first clear the template and then add your desired text. Here's how the spam report might be defined in SpamAssassin:

clear_report_template
report Spam detection software, running on the system "_HOSTNAME_", has
report identified this email as possible spam. The original message
report is attached to this so you can view it (if it isn't spam) or block
report similar future email.  If you have any questions, see
report _CONTACTADDRESS_ for details.
report 
report Content preview:  _PREVIEW_
report 
report Content analysis details:   (_HITS_ points, _REQD_ required)
report
report " pts rule name              description"
report  ---- ---------------------- ------------------------------------
report _SUMMARY_

_HOSTNAME_, _CONTACTADDRESS_, _PREVIEW_, _HITS_, _REQD_, and _SUMMARY_ are variables that are replaced by their values when the template is generated for each message. The complete list of variables, which appears in the Mail::SpamAssassin::Conf manpage, is given in Table 3-3.

Table 3-3. Variables for use in report and header templates

Variable Value
Variables that depend on the message
_YESNOCAPS_ "YES" if message is spam; "NO" if message is not spam.
_YESNO_ "YES" if message is spam; "NO" if message is not spam.
_HITS_ Spam score for message.
_BAYES_ Bayesian classifier score.
_AUTOLEARN_ "spam" if message was auto-learned as spam by the Bayesian classifier; "ham" if auto-learned as non-spam; "NO" if the message was not auto-learned.
_AWL_ Autowhitelist score modifier.
_DATE_ Date and time of SpamAssassin scan in RFC 2822 format.
_STARS_ A string containing one asterisk for each point of spam score (up to 50).
_STARS(character)_ A string containing one of character for each point of spam score (up to 50).
_RELAYSTRUSTED_ List of relays found in the message and deemed to be trusted. The list includes the IP address, reverse DNS lookup, and HELO address for each relay.
_RELAYSUNTRUSTED_ List of relay IP addresses found in the message and deemed to be untrusted.
_TESTS_, _TESTSSCORES_ Comma-separated list of tests matched, or tests matched and their associated scores.
_TESTS(character)_, _TESTS-SCORES(character)_ As in _TESTS_, _TESTSSCORES_ but separated by character instead of comma.
_LANGUAGES_ List of languages that SpamAssassin thinks a message is written in.
_PREVIEW_ Preview of message content.
_SUMMARY_ Multiline list of tests matched and their scores and descriptions.
_REPORT_ One line list of tests matched.
_RBL_ Results of positive DNSBL queries.
_DCCB_, _DCCR_ Checking host and results of DCC check of message.
_PYZOR_ Results of Pyzor check of message.
Variables that don't depend on the message
_REQD_ SpamAssassin's threshold score for calling a message spam.
_VERSION_, _SUBVERSION_ Version and subversion of SpamAssassin.
_HOSTNAME_ Hostname of SpamAssassin host.
_CONTACTADDRESS_ The value of the report_contact directive (typically, the email address of the postmaster).


The variables in Table 3-3 can also be added to customized message headers for messages processed by SpamAssassin by using the add_header directive, which takes the following form:

add_header messagetype headername string
            

The messagetype can be spam, ham (non-spam), or all and determines which kind of messages will have the header added. The new header will be named X-Spam- headername, and string, which should be enclosed in double quotes, will be the value of the header. For example, the following directive, which appears in the distributed 10_misc.cf file, adds an X-Spam-Status header to all messages—spam or not—that shows whether or not each message is spam, the spam score, the spam threshold score, the tests that were matched, whether the message is being automatically learned (see Chapter 5), and the version of SpamAssassin:

add_header all Status "_YESNO_, hits=_HITS_ required=_REQD_ tests=_TESTS_ autolearn=_
AUTOLEARN_ version=_VERSION_"

If you want to change or remove a default header, you can use the remove_header directive:

remove_header messagetype headername
            

You can remove all headers with the clear_headers directive.

20_fake_helo_tests.cf

This file defines a set of rules that use the eval test check_for_rdns_helo_mismatch( ). This test takes two arguments: a regular expression pattern to match against the reverse DNS lookup of the connecting client's IP address, and a regular expression pattern to match against the hostname provided by the client during in the SMTP HELO command. Spammers often use mail programs that forge the HELO hostname, and these tests look for such forgeries when the clients have hostnames that match those of major commercial ISPs. Here's an example of a test from this file:

header FAKE_HELO_AOL  eval:check_for_rdns_helo_mismatch("aol\.com","aol\.com")
describe FAKE_HELO_AOL  Host HELO did not match rDNS: aol.com

This test matches if the client connects from an IP address that reverse-resolves to an aol.com hostname but claims in the HELO to have a hostname that does not match "aol.com". These tests are applied to all of the Received headers from untrusted relays.

You can use this eval test to reject messages that claim, in their HELO, to be from your own host. If your hostname is myhost.example.com, and you know that your IP address reverse-resolves to the same hostname, you could add a rule like this (to the systemwide configuration file):

header FAKE_MY_HELO eval:check_for_rdns_helo_mismatch("(?!myhost\.example\.com).
{18}$","myhost\.example\.com")
describe FAKE_MY_HELO Host HELO faked my hostname
score FAKE_MY_HELO 5.0

The regular expression (?!myhost\.example\.com).{18}$ matches any hostname containing at least 18 characters that does not end in myhost.example.com, which should match the reverse DNS lookup of any untrusted relay host other than your own. If any such host claims in their HELO to be myhost.example.com, it is forging your hostname.

20_body_tests.cf

This file contains most of the tests that SpamAssassin performs against message bodies. In addition to tests for regular expressions in the body, this file defines tests against spam clearinghouses and tests of message language and locale.

A spam clearinghouse is a server that maintains a database of checksums of messages reported as spam and allows clients to test a message against the checksum database. SpamAssassin supports three spam clearinghouses: Vipul's Razor (http://razor.sf.net/), Pyzor (http://pyzor.sf.net), and the Distributed Checksum Clearinghouse, or DCC (http://rhyolite.com/anti-spam/dcc/). Special client software must be installed on the system in order for SpamAssassin to use these tests. The spamassassin —report command can be used to report confirmed spam to these clearinghouses as well.

In SpamAssassin 3.0, the pyzor_options configuration directive can be set to a string of additional options to be passed to the Pyzor client on the command line when SpamAssassin invokes it. Similarly, the dcc_options directive can be set to provide additional options to the DCC client.

Whitelists and Blacklists

Although SpamAssassin generally does a good job of avoiding false positives, you may find that some mail that you want to receive contains enough spamlike characteristics that SpamAssassin regularly tags them as spam. You may want to be sure that SpamAssassin will never mistake email from an important user, client, vendor, or other sender for spam. You may even have users who don't like spam-filtering. SpamAssassin allows you to set up systemwide or user-specific lists of senders whose mail should not be considered spam, and (systemwide) lists of users who don't want their email filtered. Such lists are called whitelists.

On the other hand, you may regularly receive unwanted mail from a particular sender that doesn't get tagged reliably by SpamAssassin. You may know ahead of time that you don't want to receive mail from certain organizations or senders. SpamAssassin also allows you to set up system-wide or user-specific lists of senders whose mail should be tagged as spam. Such lists are called blacklists.

This chapter discusses how to set up whitelists and blacklists. It begins by examining the SpamAssassin directives for systemwide whitelisting and blacklisting, and then explores two different ways to manage user-specific lists. A related feature, autowhitelists, is covered in Chapter 4.

Systemwide Whitelists

SpamAssassin whitelists reduce the spam scores of messages when the sender or recipient appears on the whitelist. Whitelists are most commonly used to ensure that messages from important senders are not marked as spam, but they can also be used to change the spam threshold for recipients or enable recipients to effectively opt out of spam-tagging.

Whitelisting senders

Use the whitelist_from directive to whitelist a sender's address. The sender's address is the address that appears in the Resent-From header, if that header exists, or in any of the headers: From, Envelope-Sender, Resent-Sender, or X-Envelope-From. If a sender's address matches a whitelist_from address, the spam score of the message is reduced by 100 points, which makes it nearly impossible for the message to be tagged as spam.

For example, if you receive important messages from boss@mybigclient.com, you can ensure that they won't be tagged as spam by using this line in the systemwide configuration file:

whitelist_from boss@mybigclient.com

You can use multiple whitelist_from directives or multiple addresses in a single directive to whitelist several addresses. You can also use an asterisk (*) as a wildcard for zero or more characters and a question mark (?) as a wildcard for zero or one character, much as you would to specify filename patterns in a shell. For example, you could whitelist all mail from mybigclient.com and from all hosts in the example.com domain with these lines:

whitelist_from *@mybigclient.com
whitelist_from example.com *.example.com

A whitelist entry can be removed with the unwhitelist_from directive. Because SpamAssassin is distributed with several default whitelist entries (in the 60_whitelist.cf file), you may find that you want to remove some of them. The unwhitelist_from directive is also useful in per-user configuration files, to remove one of the systemwide whitelist entries. To remove a whitelist entry, the address in the unwhitelist_from directive must exactly match the one given to whitelist_from.

Whitelisting senders by relay

Sometimes whitelisting by the sender's address alone isn't sufficient. For example, the sender's address might be one that's easily guessed or likely to be spoofed by spammers. For example, a spammer might try to ensure that you read his message by forging the sender's address to hostmaster@internic.net or billing@amazon.com.

SpamAssassin offers more control over whitelisted senders with the whitelist_from_rcvd directive. This directive associates a sender's email address with the hostname or domain name of the last trusted relay. SpamAssassin uses DNS to do a reverse-lookup of the IP address of the last trusted relay; the reverse-lookup yields one or more hostnames associated with the IP address. Here's how you would whitelist boss@mybigclient.com only if the last trusted relay reverse-resolves to a hostname in the mybigclient.com domain:

whitelist_from_rcvd boss@mybigclient.com mybigclient.com

Messages that match a whitelist_from_rcvd directive have their spam scores lowered by 100.

Tip

In order for SpamAssassin to distinguish trusted and untrusted relays, you may need to set the trusted_networks option, which was described earlier. If your mail topology is relatively simple—you or your ISP control all of the IP addresses in the class B network that includes your mail server's public IP address—SpamAssassin can usually make a reasonable guess.

SpamAssassin is distributed with several, default, relay-based whitelist entries in the 60_whitelist.cf file. These entries are defined with the def_whitelist_from_rcvd directive, which works just like whitelist_from_rcvd but lowers the spam score by only 15 when a message matches.

As you might expect, whitelist entries based on relays can be removed with the unwhitelist_from_rcvd address directive. The address must exactly match the address defined in a whitelist_from_rcvd or def_whitelist_from_rcvd directive. If the whitelist_from_rcvd directive uses wildcards, the unwhitelist_from_rcvd directive must specify those same wildcards.

Whitelisting recipients

SpamAssassin provides three levels of whitelisting for message recipients. Whitelisting a recipient lowers the spam score on all messages addressed to the recipient. Use recipient-whitelisting to prevent any spam-checking from being performed on behalf of a recipient. You can also use recipient-whitelisting as a crude mechanism for increasing the spam threshold—lowering the false positive rate at the cost of more false negatives—for a recipient.

A recipient's address may appear in several headers. If Resent-To and/or Resent-Cc headers are present, the address is checked against only those headers. Otherwise, the address may be matched in the last three Received headers or the headers To, Apparently-To, Delivered-To, Envelope-Recipients, Apparently-Resent-To, X-Envelope-To, Envelope-To, X-Delivered-To, X-Original-To, X-Rcpt-To, X-Real-To, or Cc.

The three levels of recipient-whitelisting are configured with the directives whitelist_to (lower spam score by 6), more_spam_to (lower spam score by 20), and all_spam_to (lower spam score by 100). For example, to ensure that no messages to root or postmaster are tagged as spam, you could use the following lines:

all_spam_to root@*
all_spam_to postmaster@*

No unwhitelist_to directive is provided because whitelisting by recipient is really useful only in systemwide configuration. Individual users can just change their required_hits setting in their .spamassassin/user_prefs file instead.

Systemwide Blacklists

SpamAssassin has only two blacklist directives (and two directives to unblacklist addresses). You can blacklist sender addresses or recipient addresses.

The blacklist_from directive is used to specify a sender's address to blacklist. The sender's address is the address that appears in the Resent-From header, if that header exists, or in any of the headers From, Envelope-Sender, Resent-Sender, or X-Envelope-From. If the sender's address matches a blacklist_from address, the spam score of the message is increased by 100 points, which makes it almost certain that the message will be tagged as spam.

For example, a spammer might send messages from support@microsofts.com in the hope that you'll think it's an important message from an operating system vendor. If you never expect to receive legitimate messages from support@microsofts.com, you can ensure that any message from that address will be tagged as spam by using this line in the systemwide configuration file:

blacklist_from support@microsofts.com

You can use multiple blacklist_from directives or multiple addresses in a single directive to blacklist several addresses. You can also use an asterisk (*) as a wildcard for zero or more characters and a question mark (?) as a wildcard for zero or one character, much as you would to specify filename patterns in a shell. For example, you could blacklist all mail from public.com and from all hosts in the example.com domain with these lines:

blacklist_from *@public.com
blacklist_from example.com *.example.com

You can remove a blacklist entry with the unblacklist_from directive. To remove a blacklist entry, the address in the unblacklist_from directive must exactly match the one given to blacklist_from.

The blacklist_to directive performs blacklisting based on recipient address. As with whitelisting, a recipient's address may appear in several headers. If Resent-To and/or Resent-Cc headers are present, the address is checked only against those headers. Otherwise, the address may be matched in the last three Received headers or the headers To, Apparently-To, Delivered-To, Envelope-Recipients, Apparently-Resent-To, X-Envelope-To, Envelope-To, X-Delivered-To, X-Original-To, X-Rcpt-To, X-Real-To, or Cc. If a recipient address matches a blacklist_to entry, the spam score of the message is increased by 10 points.

Blacklisting by recipient is most useful when spammers use software that sends mail with recognizably forged To headers (specifying the real recipient in the SMTP transaction, of course). For example, it used to be popular to send spam with a To header of friend@public.com. Although SpamAssassin already includes a special test for this address in headers, you could also use the blacklist_to configuration directive to increase the spam score for such messages by 10 points:

blacklist_to friend@public.com

No unblacklist_to directive is provided. Simply don't blacklist a recipient who should continue to receive mail.

Tip

It's possible, but silly, for the same address to be both blacklisted and whitelisted. In this case, both lists are applied and, if the blacklist adds 100 to the spam score and the whitelist subtracts 100, cancel one another out.

Per-User Whitelists and Blacklists

Email from a given address may be welcomed by one user and shunned by another. Although systemwide whitelists and blacklists are useful antispam tools, in many cases, each user will want her own individual whitelist and blacklist entries.

SpamAssassin provides two mechanisms for per-user whitelists and blacklists. The first mechanism is the simplest: add the appropriate configuration directives to the per-user configuration file for the user's account (typically ~/.spamassassin/user_prefs). The disadvantage of this approach is that it requires users to have accounts and access to their home directories.The second mechanism is to configure spamd to look up per-user test scores and whitelists and blacklist entries in an SQL or LDAP database, as described earlier in this chapter.

If users want to remove systemwide whitelist or blacklist entries, they can use the unwhitelist_from or unblacklist_from directives described earlier in this chapter.

Notes

  1. "ODBC" stands for Open Database Connectivity.
  2. Section 3.3.1 explains the details of how DNS-based blacklist-checking is performed.
Personal tools