Test Precisely and Concretely

Although it is important to test for desired, essential behavior rather than incidental behavior of an implementation, this should not be construed or mistaken as an excuse for vague tests. Tests need to be both accurate and precise.

Something of a tried and tested (and testing) classic, sorting routines provide an illustrative example. Implementing a sorting algorithm is not necessarily an everyday task for a programmer, but uses for sorting are sufficiently commonplace that the expectations for a sorting routine are familiar. This familiarity, however, often brings with it a false sense of knowledge.

When programmers are asked what they would test for if they were to implement a sorting routine, the answer most commonly given is that the resulting sequence of elements is sorted, i.e., the elements are in non-descending order. While this is not wrong, it is also not completely correct. When prompted for a more precise condition, many programmers add that the resulting sequence should be the same length as the original. Although correct, this is still insufficient. For example, given the sequence of values `[3, 1, 4, 1, 5, 9]`, the sequence `[3, 3, 3, 3, 3, 3]` satisfies a postcondition of being sorted in non-descending order and having the same length as the original sequence. It also contains an error taken from real production code (fortunately caught before it was released), where a simple slip of a keystroke or momentary lapse of reason led to an elaborate mechanism for populating a whole array with the first element of the passed array.

The full postcondition is that the result is sorted and that it holds a permutation of the passed values. This appropriately constrains the required behaviour. The fact that the result length is the same as the input length comes out in the wash and doesn't need restating.

Even stating the postcondition in the way described is not enough to give you a good test. A good test should be readable. It should be comprehensible and simple enough that you can see readily that it is correct (or not). Unless you already have code lying around for checking that a sequence is sorted and that one sequence contains a permutation of values in another, it is quite likely that the test code will be more complex than the code under test. As Tony Hoare noted:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other is to make it so complicated that there are no obvious deficiencies.

Using concrete examples eliminates this accidental complexity and opportunity for accident. For example, the result of sorting `[3, 1, 4, 1, 5, 9]` is `[1, 1, 3, 4, 5, 9]`. No other answer will do.

Concrete examples helps to illustrate general behavior in an accessible and unambiguous way. The result of adding an item to an empty collection is not simply that it is not empty, it is that the collection now has a single item. And that the single item held is the item added. Two or more items qualify as not empty. And also as wrong. A single item of a different value is also wrong. The result of adding a row to a table is not simply that the table is one row bigger, it also entails that the row's key can be used to recover the row added. And so on. In specifying behavior, tests should not simply be accurate, they also need to be precise.