Consider the Hardware
It's a common opinion that slow software just needs faster hardware. This line of thinking is not necessarily wrong, but like misusing antibiotics, it can become a big problem over time. Software architecture has abstracted the underlying hardware so much that many developers don't have any idea how it really works. Furthermore, there is often a direct conflict of interest between best programming practices and writing code that screams on the given hardware.
First, let's look at how you can start to make the most of your CPUs' prefetch cache. Code branches such as if-then constructs can only go one of two ways (jump tables aside): condition met and condition not met. Most prefetch caches look ahead by "guessing" where your code will branch to. When the cache "guesses" correctly, it's amazingly fast. If it "guesses" wrong, on the other hand, all the preprocessing on this "wrong branch" is useless and a time-consuming cache invalidation occurs. Fortunately, it's easy to start making the prefetch cache work harder for you.
If you code your branch logic so that the most frequent result is the condition that is tested for, you will help your CPU's prefetch cache be "correct" more often, leading to fewer CPU-expensive cache invalidations. This sometimes may read a little awkwardly, but systematically applying this technique over time will decrease your code's execution time.
Now, let's look at some of the conflicts between writing code for hardware and writing against mainstream best practices.
It's common practice to write many small functions in favor of large ones to ease maintainability, but the fact is that function calls require moving data to and from the stack to prepare for the function call and to return properly from it. Many applications using this paradigm spend more time preparing and recovering from work than actually doing it! Truth is, the goto command is the fastest method to get around in a code block, followed closely by jump tables. Functions are great for us developers; from the CPU's point of view, however, they are penny smart and dollar dumb.
Class inheritance and virtual functions, whether direct or via interfaces, have convenience that comes at a price. You can eliminate much of their overhead by keeping inheritance levels to a minimum and avoiding using interfaces just to extend inheritance levels. Sometimes it's better to have a larger faster executable than a smaller one that runs slower; for example try using include files to reuse code across various classes rather than using inheritance.
Video game and embedded system developers know the hardware ramifications of their code. Do you?
By Jason P Sage
This work is licensed under a Creative Commons Attribution 3
Back to 97 Things Every Programmer Should Know home page