WET Dilutes Performance Bottlenecks

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
The importance of the DRY principle (Don't Repeat Yourself) is that codifies the idea that every piece of knowledge in a system should have a singular representation. In the world of code that means we should have a single implementation. On the other hand, WET (Write Every Time) implies multiple implementations. The performance implications of DRY versus WET becomes very clear when you consider their effects on a performance profile. To be clear, their effects are many.
+
The importance of the DRY principle (Don't Repeat Yourself) is that codifies the idea that every piece of knowledge in a system should have a singular representation. In the world of code that means we should have a single implementation. On the other hand, WET (Write Every Time) leads to multiple implementations. The performance implications of DRY versus WET become very clear when you consider their effects on a performance profile. In short, their effects are many.
To see the first effect let's consider a feature of our system (say ''X'') that is a CPU bottleneck. Let's say ''X'' consumes 30% of the CPU. Now let's consider that feature ''X'' has 10 different implementations. On average, each implementation will consume 3% of the CPU. Hardly a level of CPU utilization worth considering if we are looking for a big gain. In this scenario it is unlikely that we'd recognize that feature as being a bottleneck. That said, let's move to the second point by saying, magic has happened and we recognize feature ''X'' as the source of our problem. Now we are left with the problem of finding, recognizing, and fixing every single implementation. In our example we have 10 different implementations that we need to find and fix and all because we didn't follow the DRY principle. Following DRY we'd clearly see the 30% CPU utilization and we'd have 1/10 the code to fix, not to mention the time saved by not having to find each implementation.
To see the first effect let's consider a feature of our system (say ''X'') that is a CPU bottleneck. Let's say ''X'' consumes 30% of the CPU. Now let's consider that feature ''X'' has 10 different implementations. On average, each implementation will consume 3% of the CPU. Hardly a level of CPU utilization worth considering if we are looking for a big gain. In this scenario it is unlikely that we'd recognize that feature as being a bottleneck. That said, let's move to the second point by saying, magic has happened and we recognize feature ''X'' as the source of our problem. Now we are left with the problem of finding, recognizing, and fixing every single implementation. In our example we have 10 different implementations that we need to find and fix and all because we didn't follow the DRY principle. Following DRY we'd clearly see the 30% CPU utilization and we'd have 1/10 the code to fix, not to mention the time saved by not having to find each implementation.
-
There is one use case where we are all guilty of violating DRY. That is in our use of collections. Let's say we are working with customer data. A common technique to implement a query would be to ask for an iterator over the collection and then apply the query to each element returned to us by that iterator.
+
There is one use case where we are often guilty of violating DRY. That is in our use of collections. Let's say we are working with customer data. A common technique to implement a query would be to iterate over the collection and then apply the query in turn to each element.
-
public class UsageExample {
+
public class UsageExample {
-
ArrayList<Customer> allCustomers = new ArrayList<Customer>();
+
private ArrayList<Customer> allCustomers = new ArrayList<Customer>();
public ArrayList<Customer> findCustomersThatSpendAtLeast(float amount) {
public ArrayList<Customer> findCustomersThatSpendAtLeast(float amount) {
ArrayList<Customer> customersOfInterest = new ArrayList<Customer>();
ArrayList<Customer> customersOfInterest = new ArrayList<Customer>();
-
for ( Customer customer: allCustomers) {
+
for (Customer customer: allCustomers) {
-
if ( customer.spendsAtLeast( amount))
+
if (customer.spendsAtLeast(amount))
-
customersOfInterest.add( customer);
+
customersOfInterest.add(customer);
}
}
return customersOfInterest;
return customersOfInterest;
}
}
-
}
+
}
-
By exposing this raw collection to clients, we have violated encapsulation which limits our ability to refactor, and we have forced our clients to violate DRY by having each of them implement potentially the same query. One solution is to not expose raw collections in any API. In this example we would introduce a new collection called CustomerList. This new class is more semantically in line with our domain. It will act as a natural home for all our queries.
+
By exposing this raw collection to clients, we have violated encapsulation. This limits our ability to refactor and we have forced our clients to violate DRY by having each of them implement potentially the same query. One solution is to not expose raw collections in any API. In this example we can introduce a new collection called <code>CustomerList</code>. This new class is more semantically in line with our domain. It will act as a natural home for all our queries.
Having this new collection type will also allows to easily see if these queries are a performance bottleneck. By incorporating the queries into the class we eliminate the need to expose internal representations to our clients. This gives us the freedom to alter these implementations without fear of violating client contracts.
Having this new collection type will also allows to easily see if these queries are a performance bottleneck. By incorporating the queries into the class we eliminate the need to expose internal representations to our clients. This gives us the freedom to alter these implementations without fear of violating client contracts.
Line 25: Line 25:
public class CustomerList {
public class CustomerList {
-
private ArrayList<Customer> customers;
+
private ArrayList<Customer> customers = new ArrayList<Customer>();
private SortedList<Customer> customersSortedBySpendingLevel = new SortedList<Customer)();
private SortedList<Customer> customersSortedBySpendingLevel = new SortedList<Customer)();
-
public CustomerList findCustomersThatSpendAtLeast( float amount) {
+
public CustomerList findCustomersThatSpendAtLeast(float amount) {
-
return customersSortedBySpendingLevel.elementsLargerThan( amount);
+
return customersSortedBySpendingLevel.elementsLargerThan(amount);
}
}
}
}
-
public class UsageExample() {
+
public class UsageExample {
-
public static void main( String[] args) {
+
public static void main(String[] args) {
-
CustomerList customers = new CustomerList();
+
CustomerList customers = new CustomerList();
-
CustomerList customersOfInterest = customers.findCustomersThatSpendAtLeast( 500.00);
+
// ...
 +
CustomerList customersOfInterest = customers.findCustomersThatSpendAtLeast(500.00);
}
}
}
}
- 
In this example, adherence to DRY allowed us to introduced an alternate indexing scheme with <code>SortedList</code> keyed on our customers level of spending. More important than the specific details of this particular example that following DRY helped us to find and repair a performance bottleneck that would have been more difficult to find were the code to be WET.
In this example, adherence to DRY allowed us to introduced an alternate indexing scheme with <code>SortedList</code> keyed on our customers level of spending. More important than the specific details of this particular example that following DRY helped us to find and repair a performance bottleneck that would have been more difficult to find were the code to be WET.

Revision as of 09:36, 3 November 2008

The importance of the DRY principle (Don't Repeat Yourself) is that codifies the idea that every piece of knowledge in a system should have a singular representation. In the world of code that means we should have a single implementation. On the other hand, WET (Write Every Time) leads to multiple implementations. The performance implications of DRY versus WET become very clear when you consider their effects on a performance profile. In short, their effects are many.

To see the first effect let's consider a feature of our system (say X) that is a CPU bottleneck. Let's say X consumes 30% of the CPU. Now let's consider that feature X has 10 different implementations. On average, each implementation will consume 3% of the CPU. Hardly a level of CPU utilization worth considering if we are looking for a big gain. In this scenario it is unlikely that we'd recognize that feature as being a bottleneck. That said, let's move to the second point by saying, magic has happened and we recognize feature X as the source of our problem. Now we are left with the problem of finding, recognizing, and fixing every single implementation. In our example we have 10 different implementations that we need to find and fix and all because we didn't follow the DRY principle. Following DRY we'd clearly see the 30% CPU utilization and we'd have 1/10 the code to fix, not to mention the time saved by not having to find each implementation.

There is one use case where we are often guilty of violating DRY. That is in our use of collections. Let's say we are working with customer data. A common technique to implement a query would be to iterate over the collection and then apply the query in turn to each element.

public class UsageExample {

    private ArrayList<Customer> allCustomers = new ArrayList<Customer>();

    public ArrayList<Customer> findCustomersThatSpendAtLeast(float amount) {
        ArrayList<Customer> customersOfInterest = new ArrayList<Customer>();
        for (Customer customer: allCustomers) {
            if (customer.spendsAtLeast(amount))
               customersOfInterest.add(customer);
        }
        return customersOfInterest;
    }
}

By exposing this raw collection to clients, we have violated encapsulation. This limits our ability to refactor and we have forced our clients to violate DRY by having each of them implement potentially the same query. One solution is to not expose raw collections in any API. In this example we can introduce a new collection called CustomerList. This new class is more semantically in line with our domain. It will act as a natural home for all our queries.

Having this new collection type will also allows to easily see if these queries are a performance bottleneck. By incorporating the queries into the class we eliminate the need to expose internal representations to our clients. This gives us the freedom to alter these implementations without fear of violating client contracts.

public class CustomerList {

    private ArrayList<Customer> customers = new ArrayList<Customer>();
    private SortedList<Customer> customersSortedBySpendingLevel = new SortedList<Customer)();

    public CustomerList findCustomersThatSpendAtLeast(float amount) {
        return customersSortedBySpendingLevel.elementsLargerThan(amount);
    }
}
public class UsageExample {

    public static void main(String[] args) {
        CustomerList customers = new CustomerList();
        // ...
        CustomerList customersOfInterest = customers.findCustomersThatSpendAtLeast(500.00);
    }
}
           

In this example, adherence to DRY allowed us to introduced an alternate indexing scheme with SortedList keyed on our customers level of spending. More important than the specific details of this particular example that following DRY helped us to find and repair a performance bottleneck that would have been more difficult to find were the code to be WET.


By Kirk Pepperdine

This work is licensed under a Creative Commons Attribution 3


Back to 97 Things Every Programmer Should Know home page

Personal tools