Collection of Collections Is a Code Smell

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
Have you run into, or coded yourself into a place, where the obvious solution was to create a <code>HashMap</code> of <code>HashMaps</code>? How about a <code>HashMap</code> of <code>ArrayList</code> or any other combination of a collection of collections. While some things, such a matrices, are naturally represented this way, more often than not creating a collection of collections is an indication of a missing design element. It is because of tendency that the coding pattern would be considered a ''code smell''.
+
Every once in a while I run into a design problem to which the answer is a <code>HashMap</code> of <code>HashMaps</code> or <code>HashMap</code> of <code>ArrayList</code> or some other combination of collection of collections. While some things, such a matrices, are naturally represented this way, more often than not, answer a design problem with a collection of collections isn't a natural representation. Instead it is an indication that you are missing a key abstraction. To translate this idea to code and you end up with a ''code smell''.
-
The term ''code smell'' was first coined by Kent Beck. It is used to describe code that is awkward looking or doesn't look right. It is the awkwardness that tends to point to some deeper underlying problem. In the example above, the code smell is often an indication that one needs an object to more naturally represent a relationship between the keys of the first outer collection and the values of the inner collections. This is perhaps best seen in a small example such as that provided in listing 1.
+
The term ''code smell'', first coined by Kent Beck, is used to describe code that is awkward or otherwise doesn't look quite right. Awkwardness in code tends to point to some deeper underlying problems either with the implementation or the design. In the case of a collection of collections, we often see an indication that one needs an abstraction to represent the relationship that is being expressed in the collection of collections. Quite often this missing abstraction is translated to a new class in your domain that represents some missing vocabulary. If the take the example illustrated in listing 1 we see that there is a HashMap of HashmMap. The key in the outer map is keyed on a persons last name and the inner collection is keyed on a persons first name.
public class AllPersons {
public class AllPersons {
Line 19: Line 19:
Listing 1. <code>AllPersons</code>, an implied collection.
Listing 1. <code>AllPersons</code>, an implied collection.
-
This class <code>AllPersons</code> is an implied collection that is keyed on a persons last name. The inner collection is then keyed on the persons first name. The code smell in the example is that we are keying on the last name and then the first. The question is, what is the missing design element if there is one.
+
The code smell in the example is that we are keying on the last name and then the first. The question is, what is the missing design element if there is one.
-
When we use a <code>HashMap</code> we are implicitly creating an index much in the same way we'd create an index in a database table. If we were to create an index on a single column, that would be a simple key. If we combine two or more simple keys to create another index, we have created a compound key. And this is exactly what we are doing in this example, creating an index based on two fields. From this we can conclude that the missing design element is a compound key. Listing 2. demonstrates the code with our newly discovered class.
+
One of the roles that a collection can play is to implicitly create an index over a collection much in the same way we'd create an index in a database table. If we were to create an index on a single column, that would be a simple key. If we combine two or more simple keys to create another index, we have created a compound key. And this is exactly what we are doing in this example, creating an index based on two fields. From this we can conclude that the missing design element is a compound key. Listing 2. demonstrates the code that incorporates a class that implements our newly discovered abstraction.
public class AllPersons {
public class AllPersons {
Line 34: Line 34:
Listing 2. <code>AllPersons</code>, an implied collection using a <code>CompoundKey</code>.
Listing 2. <code>AllPersons</code>, an implied collection using a <code>CompoundKey</code>.
-
Although it is perhaps a bit difficult to see in this short example, adding a needed design element often results in far less code. Not only will we have less code to read, the code we are left with is more readable. In this case, the code also has a better performance profile. All of these benefits came when as a result of paying attention to and acting on code smells.
+
If you found listing 2 to be more readable than the code in listing 1, then you've already experienced the biggest benefit, our added abstractions effect on readability. Another benefit is that the added abstraction often results in less code. Unfortunately this point isn't that clear when presented in a short example such as this. The benefits are realized only after repeated use of the abstraction. However the end results is that not only do we have less code to read, the code is more readable to begin with. In this case the abstraction also gives us a better memory utilization profile.
 +
 
 +
The next time you come across a collection or collections, think ''code smell''. And then think, what is the missing abstraction. If you pay attention to these ''code smells'', you'll quickly start to notice an improvement in your code base.
By [[Kirk Pepperdine]]
By [[Kirk Pepperdine]]

Revision as of 18:13, 19 April 2009

Every once in a while I run into a design problem to which the answer is a HashMap of HashMaps or HashMap of ArrayList or some other combination of collection of collections. While some things, such a matrices, are naturally represented this way, more often than not, answer a design problem with a collection of collections isn't a natural representation. Instead it is an indication that you are missing a key abstraction. To translate this idea to code and you end up with a code smell.

The term code smell, first coined by Kent Beck, is used to describe code that is awkward or otherwise doesn't look quite right. Awkwardness in code tends to point to some deeper underlying problems either with the implementation or the design. In the case of a collection of collections, we often see an indication that one needs an abstraction to represent the relationship that is being expressed in the collection of collections. Quite often this missing abstraction is translated to a new class in your domain that represents some missing vocabulary. If the take the example illustrated in listing 1 we see that there is a HashMap of HashmMap. The key in the outer map is keyed on a persons last name and the inner collection is keyed on a persons first name.

public class AllPersons {

    private HashMap<String,HashMap> allPersons = new HashMap<String,HashMap>();

    public void addPerson(Person person) {
        HashMap<String,HashMap> persons = this.allPersons.get(person.getLastName());
        if (persons == null) {
            persons = new HashMap<String,Person>();
            this.allPersons.add(person.getLastName(), persons);
        }
        persons.add(person.getFirstName(), person);
    }
}

Listing 1. AllPersons, an implied collection.

The code smell in the example is that we are keying on the last name and then the first. The question is, what is the missing design element if there is one.

One of the roles that a collection can play is to implicitly create an index over a collection much in the same way we'd create an index in a database table. If we were to create an index on a single column, that would be a simple key. If we combine two or more simple keys to create another index, we have created a compound key. And this is exactly what we are doing in this example, creating an index based on two fields. From this we can conclude that the missing design element is a compound key. Listing 2. demonstrates the code that incorporates a class that implements our newly discovered abstraction.

public class AllPersons {

    private HashMap<CompoundKey,Person> allPersons = new HashMap<CompoundKey,Person>();

    public void addPerson(Person person) {
        this.addPerson(new CompoundKey(person.getFirstName(), person.getLastName(), person);
    }
}

Listing 2. AllPersons, an implied collection using a CompoundKey.

If you found listing 2 to be more readable than the code in listing 1, then you've already experienced the biggest benefit, our added abstractions effect on readability. Another benefit is that the added abstraction often results in less code. Unfortunately this point isn't that clear when presented in a short example such as this. The benefits are realized only after repeated use of the abstraction. However the end results is that not only do we have less code to read, the code is more readable to begin with. In this case the abstraction also gives us a better memory utilization profile.

The next time you come across a collection or collections, think code smell. And then think, what is the missing abstraction. If you pay attention to these code smells, you'll quickly start to notice an improvement in your code base.

By Kirk Pepperdine

This work is licensed under a Creative Commons Attribution 3


Back to 97 Things Every Programmer Should Know home page

Personal tools