Collection of Collections Is a Code Smell

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Current revision (10:13, 6 July 2009) (edit) (undo)
 
(3 intermediate revisions not shown.)
Line 1: Line 1:
-
Every once in a while I run into a design problem to which the answer is a <code>HashMap</code> of <code>HashMaps</code> or <code>HashMap</code> of <code>ArrayList</code> or some other combination of collection of collections. While some things, such a matrices, are naturally represented this way, more often than not using a collection of collections is a code smell that indicates that you are missing an abstraction that is key to your domain.
+
Every once in a while I run into a design problem to which the answer is combination of a collection of collections. While some things, such a matrices, are naturally represented this way, more often than not using a collection of collections is a code smell that indicates that you are missing an abstraction that is key to your domain.
-
The term ''code smell'', first coined by Kent Beck, is used to describe code that is awkward or otherwise doesn't, well, ''smell'' quite right. Awkwardness in code tends to point to some deeper underlying problem with either the underlying implementation or the overall design. In the case of a collection of collections, we often see an indication that one needs an abstraction to represent the relationship that is being expressed in the collection of collections. Quite often this missing abstraction is translated to a new class in your domain that represents some vocabulary that is missing from your model. In the following code the key in the outer map is keyed on a persons last name and the inner collection is keyed on a persons first name:
+
The term "code smell" [http://en.wikipedia.org/wiki/Code_smell Kent Beck], is used to describe code that is awkward or otherwise doesn't, well, ''smell'' quite right. Awkwardness in code tends to point to some deeper underlying problem with either the underlying implementation or the overall design. A collection of collections is often used when the abstraction representing the relationship between the collections is missing from the model. By introducing the missing abstraction we can often eliminate improve the situation. For example, a catalog is a list of items. Quite often the item are organized into sections. Hence we could represent this by creating a list of lists
 +
 
 +
public class Catalog {
 +
 
 +
private HashMap<String, ArrayList<Item>> sections = new HashMap<String, ArrayList<Item>>();
 +
 
 +
public void addSection( String name) {
 +
sections.put( name, new ArrayList<item>());
 +
}
 +
}
 +
 
 +
As you can see there are several problems with this code. First we need to externalize the means of identifying each section. Also it's not clear that ArrayList<Item> actually is a section of the catalog. While this may not be a big deal in this small example, imagine tracking this
 +
 
 +
public class Catalog {
 +
 
 +
private ArrayList<Section> sections = new ArrayList<Section>();
 +
 
 +
public void addSection(Section section) {
 +
sections.add( section);
 +
}
 +
}
-
public class AllPersons {
 
-
private HashMap<String, HashMap> allPersons = new HashMap<String, HashMap>();
 
-
 
-
public void addPerson(Person person) {
 
-
HashMap<String, HashMap> persons = allPersons.get(person.getLastName());
 
-
if (persons == null) {
 
-
persons = new HashMap<String, Person>();
 
-
allPersons.add(person.getLastName(), persons);
 
-
}
 
-
persons.add(person.getFirstName(), person);
 
-
}
 
-
}
 
 +
A better abstraction in this case would be to introduce
The code smell in the example is that we are keying on two different values, one after the other. The question is, what is the missing design element if there is one?
The code smell in the example is that we are keying on two different values, one after the other. The question is, what is the missing design element if there is one?
Line 24: Line 33:
public void addPerson(Person person) {
public void addPerson(Person person) {
-
this.addPerson(new CompoundKey(person.getFirstName(), person.getLastName()), person);
+
allPersons.add(new CompoundKey(person.getFirstName(), person.getLastName()), person);
}
}
}
}

Current revision

Every once in a while I run into a design problem to which the answer is combination of a collection of collections. While some things, such a matrices, are naturally represented this way, more often than not using a collection of collections is a code smell that indicates that you are missing an abstraction that is key to your domain.

The term "code smell" Kent Beck, is used to describe code that is awkward or otherwise doesn't, well, smell quite right. Awkwardness in code tends to point to some deeper underlying problem with either the underlying implementation or the overall design. A collection of collections is often used when the abstraction representing the relationship between the collections is missing from the model. By introducing the missing abstraction we can often eliminate improve the situation. For example, a catalog is a list of items. Quite often the item are organized into sections. Hence we could represent this by creating a list of lists

   public class Catalog {
       private HashMap<String, ArrayList<Item>> sections = new HashMap<String, ArrayList<Item>>();
       public void addSection( String name) {
           sections.put( name, new ArrayList<item>());
       }
   }

As you can see there are several problems with this code. First we need to externalize the means of identifying each section. Also it's not clear that ArrayList<Item> actually is a section of the catalog. While this may not be a big deal in this small example, imagine tracking this

  public class Catalog {
       private ArrayList<Section> sections = new ArrayList<Section>();
       public void addSection(Section section) {
           sections.add( section);
       }
   }


A better abstraction in this case would be to introduce The code smell in the example is that we are keying on two different values, one after the other. The question is, what is the missing design element if there is one?

One of the roles that a collection can play is to implicitly create an index over a collection much in the same way we'd create an index in a database table. If we were to create an index on a single column, that would be a simple key. If we combine two or more simple keys to create another index, we have created a compound key. And this is exactly what we are doing in this example, creating an index based on two fields. From this we can conclude that the missing design element is a compound key. The following code demonstrates the effect of implementing our newly discovered abstraction:

public class AllPersons {
    private HashMap<CompoundKey, Person> allPersons = new HashMap<CompoundKey, Person>();

    public void addPerson(Person person) {
        allPersons.add(new CompoundKey(person.getFirstName(), person.getLastName()), person);
    }
}

If you found the second listing to be more readable than the first, then you've already experienced the biggest benefit, the added abstraction's effect on readability. Another benefit is that the added abstraction often results in less code. Unfortunately this point isn't that clear when presented in a short example such as this. The benefits are realized only after repeated use of the abstraction. However the end results is that not only do we have less code to read, the code is more readable to begin with. In this case the abstraction also gives us a better memory utilization profile.

The next time you come across a collection or collections, think code smell. And then think, what is the missing abstraction. If you pay attention to these code smells, you'll quickly start to notice an improvement in your code base.

By Kirk Pepperdine

This work is licensed under a Creative Commons Attribution 3


Back to 97 Things Every Programmer Should Know home page

Personal tools