Know Your Limits

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Current revision (19:10, 14 June 2012) (edit) (undo)
m (Fix dead link.)
 
(27 intermediate revisions not shown.)
Line 1: Line 1:
-
''"Man's got to know his limitations." -- [http://www.youtube.com/watch?v=t2JnCXvm_Qc Dirty Harry]''
+
[[Image:Clearly.jpeg|thumb|right]]
-
Your resources are limited. You only have so much time and money to do your work, including the time and money needed to keep your knowledge, skills, and tools up-to-date. You can only work so hard, so fast, so smart, and so long. Your tools are only so powerful and fast. Your target machines are only so powerful and fast. So you have to know the limits of your resources.
+
''"Man's got to know his limitations." — [http://youtu.be/CG2cux_6Rcw Dirty Harry]''
-
How to respect those limits? Know yourself, know your budgets, and know your stuff. Especially, know the space and time complexity of your data structures and algorithms, and the architecture and performance characteristics of your systems.
+
Your resources are limited. You only have so much time and money to do your work, including the time and money needed to keep your knowledge, skills, and tools up-to-date. You can only work so hard, so fast, so smart, and so long. Your tools are only so powerful. Your target machines are only so powerful. So you have to respect the limits of your resources.
-
[[Image:Clearly.jpeg|thumb|right]][[Image:clearly100.jpeg|thumb|right]]
+
How to respect those limits? Know yourself, know your people, know your budgets, and know your stuff. Especially, as a software engineer, know the space and time complexity of your data structures and algorithms, and the architecture and performance characteristics of your systems. Your job is to create an optimal marriage of software and systems.
-
Space and time complexity are given as the function ''O(f(n))'' which for ''n'' equal the size of the input is the asymptotic space or time required as ''n'' grows to infinity. Important complexity classes include ''f=ln(n)'', ''f=n'', ''f=n*ln(n)'',''f=n**e'', and ''f=e**n''. Clearly, as ''n'' gets bigger ''O(log(n))'' is ever so much smaller than ''O(e**n)''. As Sean Parent puts it, for large enough ''n'' all complexity classes amount to near-infinite, near-linear, or near-constant.
+
Space and time complexity are given as the function ''O(f(n))'' which for ''n'' equal the size of the input is the asymptotic space or time required as ''n'' grows to infinity. Important complexity classes include ''f(n)=log(n)'', ''f(n)=n'', ''f(n)=n×log(n)'', ''f(n)=n<sup>e</sup>'', and ''f(n)=e<sup>n</sup>''. Clearly, as ''n'' gets bigger ''O(log(n))'' is ever so much smaller than ''O(e<sup>n</sup>)''. As Sean Parent puts it, for large enough ''n'' all complexity classes amount to near-infinite, near-linear, or near-constant.
-
Modern computer systems are organized as hierarchies of physical and virtual machines, including language runtimes, operating systems, CPUs, cache memory, random-access memory, disk drives, and networks. Typical limits include:
+
Complexity analysis is in terms of an abstract machine, but software runs on real machines. Modern computer systems are organized as hierarchies of physical and virtual machines, including language runtimes, operating systems, CPUs, cache memory, random-access memory, disk drives, and networks. Typical limits on random access time and storage capacity include:
-
 
+
{|align="left"
-
{| border="1"
+
!align="left"|
-
|+Storage capacity
+
| style="color:#e76700;" | access time
 +
|align="right" style="color:#e76700;" | &nbsp;capacity
|-
|-
-
|64 b
+
!align="left"|register
-
|register
+
|align="right"| < 1 ns
 +
|align="right"|64 b
 +
| &nbsp;&nbsp;
|-
|-
-
|64 B
+
!align="left"|cache line
-
|cache line
+
|align="right"|
 +
|align="right"|64 B
|-
|-
-
|64 KB
+
!align="right"|L1 cache
-
|L1 cache
+
|align="right"| 1 ns
 +
|align="right"|64 KB
|-
|-
-
|8 MB
+
!align="right"|L2 cache
-
|L2 cache
+
|align="right"| 4 ns
 +
|align="right"|8 MB
|-
|-
-
|32 GB
+
!align="left"|RAM
-
|RAM
+
|align="right"|20 ns
 +
|align="right"|32 GB
|-
|-
-
|10TB
+
!align="left"|disk
-
|disk
+
|align="right"|10 ms
 +
|align="right"|10 TB
 +
|-
 +
!align="left"|LAN
 +
|align="right"|20 ms
 +
|align="right"|> 1 PB
 +
|-
 +
!align="left"|Internet
 +
|align="right"|100 ms
 +
|align="right"|> 1 ZB
 +
|-
 +
|
|}
|}
 +
Note that capacity and speed vary by several orders of magnitude. Caching and lookahead are used heavily at every level of our systems to hide this variation, but they only work when access is predictable. When cache misses are frequent the system will be thrashing. For example, to randomly inspect every byte on a hard drive could take 32 years. Even to randomly inspect every byte in RAM could take 11 minutes. You can learn the limits of your systems from the manufacturers' literature, and can monitor the performance of your systems with tools like ''top'', ''oprofile'', ''gprof'', ''ping'', and ''traceroute''.
-
{| border="1"
+
{|align="right"
-
|+Access time
+
|+ align="bottom" style="color:#e76700;" | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Search time (ns)
|-
|-
-
|< 1 ns
+
!align="right"| n
-
|register
+
!vEB
 +
!binary
 +
! linear
|-
|-
-
|< 1 ns
+
!align="right"| 8
-
|L1 cache
+
|align="right"| 40
 +
|align="right"| 90
 +
|align="right"| 50
|-
|-
-
|< 4 ns
+
!align="right"| 64
-
|L2 cache
+
|align="right"| 70
 +
|align="right"| 150
 +
|align="right"| 180
|-
|-
-
|~ 20 ns
+
!align="right"| 512
-
|RAM
+
|align="right"| 100
 +
|align="right"| 230
 +
|align="right"| 1200
|-
|-
-
|~ 10 ms
+
!align="right"| 4096
-
|disk
+
|align="right"| 160
-
|-
+
|align="right"| 320
-
|~ 20 ms
+
|align="right"| 17000
-
|LAN
+
-
|-
+
-
|~ 100-1000 ms.
+
-
|Internet
+
|}
|}
 +
Algorithms and data structures vary in how effectively they use caches. For instance, linear search makes good use of lookahead, but requires ''O(n)'' comparisons. Binary search of a sorted array requires only ''O(log(n))'' comparisons, but tends to be cache-hostile, whereas searching a B-tree or van Emde Boas tree is ''O(log(n))'' and cache-friendly. Study up on "cache-aware" and "cache-oblivious" to know your stuff.
-
Note that capacity and speed vary by several orders of magnitude. At worst, to randomly inspect every byte on a hard drive could take 100,000,000 seconds = 31.7 years. Even to randomly inspect every byte in RAM could take 640 seconds. Caching and lookahead are used heavily at every level of our systems to hide this variation, but they only work when access is non-random. When cache misses are frequent the system will be thrashing. You can learn the limits of your systems from the manufacturers' literature, and can monitor the performance of your systems with tools like ''top'', ''oprofile'', ''gprof'', ''ping'', and ''traceroute''.
+
''"You pays your money and you makes your choice." &mdash; Punch''
-
 
+
-
Algorithms vary in how effectively they use caches. For instance, linear search makes good use of lookahead, but requires ''O(n)'' comparisons. Binary search of a sorted array requires only ''O(log(n))'' comparisons, but tends to be cache-hostile. And searching a von Embde Boas array is ''O(log(n))'' and cache-friendly. Search for "cache-aware algorithm" and "cache-oblivious algorithm" to learn more.
+
-
 
+
-
''"You pays your money and you makes your choice." -- Punch''
+

Current revision

"Man's got to know his limitations." — Dirty Harry

Your resources are limited. You only have so much time and money to do your work, including the time and money needed to keep your knowledge, skills, and tools up-to-date. You can only work so hard, so fast, so smart, and so long. Your tools are only so powerful. Your target machines are only so powerful. So you have to respect the limits of your resources.

How to respect those limits? Know yourself, know your people, know your budgets, and know your stuff. Especially, as a software engineer, know the space and time complexity of your data structures and algorithms, and the architecture and performance characteristics of your systems. Your job is to create an optimal marriage of software and systems.

Space and time complexity are given as the function O(f(n)) which for n equal the size of the input is the asymptotic space or time required as n grows to infinity. Important complexity classes include f(n)=log(n), f(n)=n, f(n)=n×log(n), f(n)=ne, and f(n)=en. Clearly, as n gets bigger O(log(n)) is ever so much smaller than O(en). As Sean Parent puts it, for large enough n all complexity classes amount to near-infinite, near-linear, or near-constant.

Complexity analysis is in terms of an abstract machine, but software runs on real machines. Modern computer systems are organized as hierarchies of physical and virtual machines, including language runtimes, operating systems, CPUs, cache memory, random-access memory, disk drives, and networks. Typical limits on random access time and storage capacity include:

access time  capacity
register < 1 ns 64 b   
cache line 64 B
L1 cache 1 ns 64 KB
L2 cache 4 ns 8 MB
RAM 20 ns 32 GB
disk 10 ms 10 TB
LAN 20 ms > 1 PB
Internet 100 ms > 1 ZB

Note that capacity and speed vary by several orders of magnitude. Caching and lookahead are used heavily at every level of our systems to hide this variation, but they only work when access is predictable. When cache misses are frequent the system will be thrashing. For example, to randomly inspect every byte on a hard drive could take 32 years. Even to randomly inspect every byte in RAM could take 11 minutes. You can learn the limits of your systems from the manufacturers' literature, and can monitor the performance of your systems with tools like top, oprofile, gprof, ping, and traceroute.

         Search time (ns)
n vEB binary linear
8 40 90 50
64 70 150 180
512 100 230 1200
4096 160 320 17000

Algorithms and data structures vary in how effectively they use caches. For instance, linear search makes good use of lookahead, but requires O(n) comparisons. Binary search of a sorted array requires only O(log(n)) comparisons, but tends to be cache-hostile, whereas searching a B-tree or van Emde Boas tree is O(log(n)) and cache-friendly. Study up on "cache-aware" and "cache-oblivious" to know your stuff.

"You pays your money and you makes your choice." — Punch

Personal tools