Floating-point Numbers Aren't Real

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
Floating-point numbers are not "real numbers" in the mathematical sense, even though they are called "real" in some programming languages. Real numbers have infinite precision and are therefore continuous and non-lossy; floating-point numbers have fixed precision and resemble "badly-behaved" integers, because they're not evenly spaced throughout their range, and they have a limited range.
+
Floating-point numbers are not "real numbers" in the mathematical sense, even though they are called ''real'' in some programming languages, such as Pascal and Fortran. Real numbers have infinite precision and are therefore continuous and non-lossy; floating-point numbers have limited precision, so they are finite, and they resemble "badly-behaved" integers, because they're not evenly spaced throughout the range.
To illustrate, assign 2147483647 (the largest signed 32-bit integer) to a 32-bit <code>float</code> variable (<code>x</code>, say), and print it. You'll see 2147483648. Now print <code>x - 64</code>. Still 2147483648. Now print <code>x-65</code> and you'll get 2147483520! Why? Because the spacing between adjacent floats in that range is 128, and floating-point operations round to the nearest floating-point number.
To illustrate, assign 2147483647 (the largest signed 32-bit integer) to a 32-bit <code>float</code> variable (<code>x</code>, say), and print it. You'll see 2147483648. Now print <code>x - 64</code>. Still 2147483648. Now print <code>x-65</code> and you'll get 2147483520! Why? Because the spacing between adjacent floats in that range is 128, and floating-point operations round to the nearest floating-point number.
-
IEEE floating-point numbers are fixed-precision numbers based on base-two scientific notation: 1.d<sub>1</sub>d<sub>2</sub>...d<sub>p-1</sub> × 2<sup>e</sup>. ''p'' is the precision (24 for <code>float</code>, 53 for <code>double</code>). The spacing between two consecutive numbers is 2<sup>1-p+e</sup>, which can be safely approximated by ε|x|, where ε is the ''machine epsilon'' (2<sup>1-p</sup>). Knowing the spacing in the neighborhood of a floating-point number can help you avoid classic numerical blunders. For example, if you're performing an iterative calculation, such as searching for the root of an equation, there's no sense in asking for greater precision than the number system can give in the neighb
+
IEEE floating-point numbers are fixed-precision numbers based on base-two scientific notation: 1.d<sub>1</sub>d<sub>2</sub>...d<sub>p-1</sub> × 2<sup>e</sup>, where ''p'' is the precision (24 for <code>float</code>, 53 for <code>double</code>). The spacing between two consecutive numbers is 2<sup>1-p+e</sup>, which can be safely approximated by ε|x|, where ε is the ''machine epsilon'' (2<sup>1-p</sup>).
-
orhood of the answer. Make sure that the tolerance you request is no smaller than the spacing there; otherwise you'll loop forever.
+
 
 +
Knowing the spacing in the neighborhood of a floating-point number can help you avoid classic numerical blunders. For example, if you're performing an iterative calculation, such as searching for the root of an equation, there's no sense in asking for greater precision than the number system can give in the neighborhood of the answer. Make sure that the tolerance you request is no smaller than the spacing there; otherwise you'll loop forever.
 +
 
 +
 
 +
By [[Chuck Allison]]
 +
 
 +
This work is licensed under a [http://creativecommons.org/licenses/by/3.0/us/ Creative Commons Attribution 3]
 +
 
 +
 
 +
 
 +
Back to [[97 Things Every Programmer Should Know]] home page

Revision as of 02:23, 18 December 2008

Floating-point numbers are not "real numbers" in the mathematical sense, even though they are called real in some programming languages, such as Pascal and Fortran. Real numbers have infinite precision and are therefore continuous and non-lossy; floating-point numbers have limited precision, so they are finite, and they resemble "badly-behaved" integers, because they're not evenly spaced throughout the range.

To illustrate, assign 2147483647 (the largest signed 32-bit integer) to a 32-bit float variable (x, say), and print it. You'll see 2147483648. Now print x - 64. Still 2147483648. Now print x-65 and you'll get 2147483520! Why? Because the spacing between adjacent floats in that range is 128, and floating-point operations round to the nearest floating-point number.

IEEE floating-point numbers are fixed-precision numbers based on base-two scientific notation: 1.d1d2...dp-1 × 2e, where p is the precision (24 for float, 53 for double). The spacing between two consecutive numbers is 21-p+e, which can be safely approximated by ε|x|, where ε is the machine epsilon (21-p).

Knowing the spacing in the neighborhood of a floating-point number can help you avoid classic numerical blunders. For example, if you're performing an iterative calculation, such as searching for the root of an equation, there's no sense in asking for greater precision than the number system can give in the neighborhood of the answer. Make sure that the tolerance you request is no smaller than the spacing there; otherwise you'll loop forever.


By Chuck Allison

This work is licensed under a Creative Commons Attribution 3


Back to 97 Things Every Programmer Should Know home page

Personal tools