Difference between Single and Double Precision:In single precision, 32 bits are used to represent floating-point number. In double precision, 64 bits are used to represent floating-point number. This format, also known as FP32, is suitable for calculations that won't be adversely affected by some approximation.
float has 7 decimal digits of precision. double is a 64-bit IEEE 754 double precision Floating Point Number – 1 bit for the sign, 11 bits for the exponent, and 52* bits for the value. double has 15 decimal digits of precision.
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Single-Precision Floating PointBecause MATLAB stores numbers of type single using 32 bits, they require less memory than numbers of type double , which use 64 bits. However, because they are stored with fewer bits, numbers of type single are represented to less precision than numbers of type double .
Note: the Nintendo 64 does have a 64-bit processor, however:
Many games took advantage of the chip's 32-bit processing mode as the greater data precision available with 64-bit data types is not typically required by 3D games, as well as the fact that processing 64-bit data uses twice as much RAM, cache, and bandwidth, thereby reducing the overall system performance.
From Webopedia:
The term double precision is something of a misnomer because the precision is not really double.
The word double derives from the fact that a double-precision number uses twice as many bits as a regular floating-point number.
For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long.The extra bits increase not only the precision but also the range of magnitudes that can be represented.
The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values.
Most computers use a standard format known as the IEEE floating-point format.
The IEEE double-precision format actually has more than twice as many bits of precision as the single-precision format, as well as a much greater range.
From the IEEE standard for floating point arithmetic
Single Precision
The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right.
the final 23 bits are the fraction 'F':
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1 8 9 31
The value V represented by the word may be determined as follows:
0<E<255
then V=(-1)**S * 2 ** (E-127) * (1.F)
where "1.F" is
intended to represent the binary number created by prefixing F with an
implicit leading 1 and a binary point.V=(-1)**S * 2 ** (-126) * (0.F)
. These
are "unnormalized" values.In particular,
0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity
0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN
0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5
0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) *
0.00000000000000000000001 =
2**(-149) (Smallest positive value)
Double Precision
The IEEE double precision floating point standard representation requires a 64 bit word, which may be represented as numbered from 0 to 63, left to right.
the final 52 bits are the fraction 'F':
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 11 12 63
The value V represented by the word may be determined as follows:
0<E<2047
then V=(-1)**S * 2 ** (E-1023) * (1.F)
where "1.F" is
intended to represent the binary number created by prefixing F with an
implicit leading 1 and a binary point.V=(-1)**S * 2 ** (-1022) * (0.F)
These
are "unnormalized" values.Reference:
ANSI/IEEE Standard 754-1985,
Standard for Binary Floating Point Arithmetic.
I read a lot of answers but none seems to correctly explain where the word double comes from. I remember a very good explanation given by a University professor I had some years ago.
Recalling the style of VonC's answer, a single precision floating point representation uses a word of 32 bit.
Representation:
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
bits: 31 30 23 22 0
(Just to point out, the sign bit is the last, not the first.)
A double precision floating point representation uses a word of 64 bit.
Representation:
S EEEEEEEEEEE MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
bits: 63 62 52 51 0
As you may notice, I wrote that the mantissa has, in both types, one bit more of information compared to its representation. In fact, the mantissa is a number represented without all its non-significative 0
. For example,
This means that the mantissa will always be in the form
0.α1α2...αt × βp
where β is the base of representation. But since the fraction is a binary number, α1 will always be equal to 1, thus the fraction can be rewritten as 1.α2α3...αt+1 × 2p and the initial 1 can be implicitly assumed, making room for an extra bit (αt+1).
Now, it's obviously true that the double of 32 is 64, but that's not where the word comes from.
The precision indicates the number of decimal digits that are correct, i.e. without any kind of representation error or approximation. In other words, it indicates how many decimal digits one can safely use.
With that said, it's easy to estimate the number of decimal digits which can be safely used:
Okay, the basic difference at the machine is that double precision uses twice as many bits as single. In the usual implementation,that's 32 bits for single, 64 bits for double.
But what does that mean? If we assume the IEEE standard, then a single precision number has about 23 bits of the mantissa, and a maximum exponent of about 38; a double precision has 52 bits for the mantissa, and a maximum exponent of about 308.
The details are at Wikipedia, as usual.
To add to all the wonderful answers here
First of all float and double are both used for representation of numbers fractional numbers. So, the difference between the two stems from the fact with how much precision they can store the numbers.
For example: I have to store 123.456789 One may be able to store only 123.4567 while other may be able to store the exact 123.456789.
So, basically we want to know how much accurately can the number be stored and is what we call precision.
Quoting @Alessandro here
The precision indicates the number of decimal digits that are correct, i.e. without any kind of representation error or approximation. In other words, it indicates how many decimal digits one can safely use.
Float can accurately store about 7-8 digits in the fractional part while Double can accurately store about 15-16 digits in the fractional part
So, float can store double the amount of fractional part. That is why Double is called double the float
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With