Consider the Hardware

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Current revision (04:51, 6 October 2009) (edit) (undo)
 
(6 intermediate revisions not shown.)
Line 1: Line 1:
-
It's common for developers, managers, and users alike to have the opinion that software that doesn't run fast enough just needs faster hardware. This line of thinking is not necessarily wrong, but like incorrect use of antibiotics, over time can become a big problem that is hard to solve. Unfortunately, software development seems to have been abstracted so much that many developers do not even consider the hardware at all. Furthermore, there is often a direct conflict of interest between best programming practices and writing code that screams on the given hardware.
+
It's a common opinion that slow software just needs faster hardware. This line of thinking is not necessarily wrong, but like misusing antibiotics, it can become a big problem over time. Software architecture has abstracted the underlying hardware so much that many developers don't have any idea how it really works. Furthermore, there is often a direct conflict of interest between best programming practices and writing code that screams on the given hardware.
-
First let's look at something that doesn't cause any conflicts in most developer shops' accepted best practices: Making the most of your CPUs' prefetch cache. CPU prefetch caching logic and heuristics are not trivial but there is something trivial you can do in all your branching logic to make it work for you. All branching in your code can only go two ways (jump tables aside): One way if a condition is met, another if that condition is not met. Most CPU Prefetch cache load and pre-process code that hasn't even been encountered yet by "guessing" the path your program will follow before it gets there. If the CPU discovers that the "guesses" it made were incorrect, all the preprocessing on this "wrong branch" are useless and invalidate the cache: this is a time consuming consequence in CPU cycles. So to befriend the CPU cache you need to know how to make it work for you, fortunately, its quite simple. If you code all your branch logic so that the MOST FREQUENT RESULT is the condition that is tested for, you will help your CPU's prefetch cache be "correct" more often resulting is less "CPU Expensive" cache invalidations. This sometimes may read a little awkward but systemically applying this technique over time will increase your code's execution time.
+
First, let's look at how you can start to make the most of your CPUs' prefetch cache. Code branches such as ''if-then'' constructs can only go one of two ways (jump tables aside): condition met and condition not met. Most prefetch caches look ahead by "guessing" where your code will branch to. When the cache "guesses" correctly, it's amazingly fast. If it "guesses" wrong, on the other hand, all the preprocessing on this "wrong branch" is useless and a time-consuming cache invalidation occurs. Fortunately, it's easy to start making the prefetch cache work harder for you.
-
Now, let's look at some of the conflicts between writing code for hardware and writing for mainstream best practices. It's common practice to write many small functions in favor of larger ones that are harder to maintain. Now this is perfectly sound from an abstract point of view but the fact is that every function call requires moving data to and from the stack both to prepare for the function call and to return properly from it. Most applications that are written using this design paradigm actually spend more time getting ready and recovering from doing work than they do actual work. Some developers might cringe at the mention of the fact that the often ridiculed GOTO command is the most efficient method to get around a code block; rivaled only by the lightning fast jump table - another hardware savvy technique that many developers aren't even aware of.
+
If you code your branch logic so that the ''most frequent result'' is the condition that is tested for, you will help your CPU's prefetch cache be "correct" more often, leading to fewer CPU-expensive cache invalidations. This sometimes may read a little awkwardly, but systematically applying this technique over time will decrease your code's execution time.
 +
 
 +
Now, let's look at some of the conflicts between writing code for hardware and writing against mainstream best practices.
 +
 
 +
It's common practice to write many small functions in favor of large ones to ease maintainability, but the fact is that function calls require moving data to and from the stack to prepare for the function call and to return properly from it. Many applications using this paradigm spend more time preparing and recovering from work than actually doing it! Truth is, the ''goto'' command is the fastest method to get around in a code block, followed closely by jump tables. Functions are great for us developers; from the CPU's point of view, however, they are penny smart and dollar dumb.
 +
 
 +
Inline functions are a different animal that trades requiring start up and completion tasks for overall program size. The word inline is accurate because everywhere in your source code where you have a call to an inline function, the code for that function is copied into your code verbatim and then it is compiled. Having too many inline functions, especially large ones, can significantly increase your compiled application's filesize; this might be a worthwhile trade. Speed or Filesize? Make your choice!
 +
 
 +
Depending on your compiler, there can be implications on program execution speed using classes. Many developers seem to think having class libraries that contain many generations of inheritance is wonderful when it comes at a price. Normal inheritance levels incur some overhead if you get into the nuts and bolts of them but the real problem when it comes to efficiency is those nasty virtual functions. Virtual functions require a virtual base class lookup table for each level of virtual inheritance and matters can get exponentially worse when there are multiple levels of inheritance. There are better things your processor could be doing than traversing memory linked list tables to hunt for code to run.
 +
 
 +
What hardware are you developing for? What does your compiler do to your code as it turns it to assembly language? Are you using a virtual machine? You'll rarely find a single programming methodology that will work perfectly on all hardware platforms, real or virtual.
 +
 
 +
Video game and embedded system developers know the hardware ramifications of their compiled code. Do you?
 +
 
 +
By [[Jason P Sage]]
 +
 
 +
 
 +
This work is licensed under a [http://creativecommons.org/licenses/by/3.0/us/ Creative Commons Attribution 3]
 +
 
 +
Back to [[97 Things Every Programmer Should Know]] home page

Current revision

It's a common opinion that slow software just needs faster hardware. This line of thinking is not necessarily wrong, but like misusing antibiotics, it can become a big problem over time. Software architecture has abstracted the underlying hardware so much that many developers don't have any idea how it really works. Furthermore, there is often a direct conflict of interest between best programming practices and writing code that screams on the given hardware.

First, let's look at how you can start to make the most of your CPUs' prefetch cache. Code branches such as if-then constructs can only go one of two ways (jump tables aside): condition met and condition not met. Most prefetch caches look ahead by "guessing" where your code will branch to. When the cache "guesses" correctly, it's amazingly fast. If it "guesses" wrong, on the other hand, all the preprocessing on this "wrong branch" is useless and a time-consuming cache invalidation occurs. Fortunately, it's easy to start making the prefetch cache work harder for you.

If you code your branch logic so that the most frequent result is the condition that is tested for, you will help your CPU's prefetch cache be "correct" more often, leading to fewer CPU-expensive cache invalidations. This sometimes may read a little awkwardly, but systematically applying this technique over time will decrease your code's execution time.

Now, let's look at some of the conflicts between writing code for hardware and writing against mainstream best practices.

It's common practice to write many small functions in favor of large ones to ease maintainability, but the fact is that function calls require moving data to and from the stack to prepare for the function call and to return properly from it. Many applications using this paradigm spend more time preparing and recovering from work than actually doing it! Truth is, the goto command is the fastest method to get around in a code block, followed closely by jump tables. Functions are great for us developers; from the CPU's point of view, however, they are penny smart and dollar dumb.

Inline functions are a different animal that trades requiring start up and completion tasks for overall program size. The word inline is accurate because everywhere in your source code where you have a call to an inline function, the code for that function is copied into your code verbatim and then it is compiled. Having too many inline functions, especially large ones, can significantly increase your compiled application's filesize; this might be a worthwhile trade. Speed or Filesize? Make your choice!

Depending on your compiler, there can be implications on program execution speed using classes. Many developers seem to think having class libraries that contain many generations of inheritance is wonderful when it comes at a price. Normal inheritance levels incur some overhead if you get into the nuts and bolts of them but the real problem when it comes to efficiency is those nasty virtual functions. Virtual functions require a virtual base class lookup table for each level of virtual inheritance and matters can get exponentially worse when there are multiple levels of inheritance. There are better things your processor could be doing than traversing memory linked list tables to hunt for code to run.

What hardware are you developing for? What does your compiler do to your code as it turns it to assembly language? Are you using a virtual machine? You'll rarely find a single programming methodology that will work perfectly on all hardware platforms, real or virtual.

Video game and embedded system developers know the hardware ramifications of their compiled code. Do you?

By Jason P Sage


This work is licensed under a Creative Commons Attribution 3

Back to 97 Things Every Programmer Should Know home page

Personal tools